Commit Graph

2035 Commits (53cb9606bd86ed394e3a3d18a82fae5428f09155)

Author SHA1 Message Date
Guangyao Zhang 53cb9606bd
[Feature] llama shardformer fp8 support (#5938)
4 months ago
ver217 ae486ce005 [fp8] add fp8 comm for low level zero
4 months ago
Hongxin Liu 5fd0592767
[fp8] support all-gather flat tensor (#5932)
4 months ago
GuangyaoZhang 5b969fd831 fix shardformer fp8 communication training degradation
4 months ago
GuangyaoZhang 6a20f07b80 remove all to all
4 months ago
GuangyaoZhang 5a310b9ee1 fix rebase
5 months ago
GuangyaoZhang 457a0de79f shardformer fp8
5 months ago
pre-commit-ci[bot] 51f916b11d [pre-commit.ci] auto fixes from pre-commit.com hooks
5 months ago
BurkeHulk 1f1b856354 Merge remote-tracking branch 'origin/feature/fp8_comm' into feature/fp8_comm
5 months ago
BurkeHulk e88190184a support fp8 communication in pipeline parallelism
5 months ago
BurkeHulk 1e1959467e fix scaling algorithm in FP8 casting
5 months ago
GuangyaoZhang dbfa7d39fc fix typo
5 months ago
pre-commit-ci[bot] e17f835df7 [pre-commit.ci] auto fixes from pre-commit.com hooks
5 months ago
Hanks 6991819a97
Merge branch 'hpcaitech:main' into feature/fp8_comm
5 months ago
Hongxin Liu 7afbc81d62
[quant] fix bitsandbytes version check (#5882)
5 months ago
Wang Binluo 6cd4c32be4
[shardformer] fix the moe (#5883)
5 months ago
Edenzzzz eb24fcd914
[Hotfix] Fix OPT gradient checkpointing forward
5 months ago
Haze188 ea94c07b95
[hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879)
5 months ago
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
5 months ago
Jianghai 8ab46b4000
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874)
5 months ago
HangXu f5a52e1600
fp8 operators for compressed communication
5 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821)
5 months ago
flybird11111 773d9f964a
[shardformer]delete xformers (#5859)
5 months ago
Runyu Lu 3c7cda0c9a
[Inference]Lazy Init Support (#5785)
5 months ago
Guangyao Zhang d9d5e7ea1f
[shardformer] Support the T5ForTokenClassification model (#5816)
5 months ago
Hongxin Liu 5dfbcd7746
[zero] use bucket during allgather (#5860)
5 months ago
botbw 8e718a1421
[gemini] fixes for benchmarking (#5847)
5 months ago
Edenzzzz 2a25a2aff7
[Feature] optimize PP overlap (#5735)
5 months ago
botbw 8a5c86439a
[gemini] fix missing return (#5845)
5 months ago
Yuanheng Zhao 7b249c76e5
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837)
5 months ago
Kai Lv 0adca5b688
[launch] Support IPv4 host initialization in launch (#5822)
5 months ago
GuangyaoZhang d84d68601a change 'xxx if xxx else None' to 'xxx or None'
5 months ago
GuangyaoZhang a83a2336e8 rebase master llama change
5 months ago
GuangyaoZhang 363cde6957 merge model and attention forward
5 months ago
GuangyaoZhang 7a2b08646f Remove CohereLayerNorm and use existing layernorm
5 months ago
GuangyaoZhang fe2e74c03a fix precommit
5 months ago
GuangyaoZhang f656d61778 change command
5 months ago
GuangyaoZhang 0b81163bc0 Copy llama to command
5 months ago
Edenzzzz 8795bb2e80
Support 4d parallel + flash attention (#5789)
5 months ago
flybird11111 2ddf624a86
[shardformer] upgrade transformers to 4.39.3 (#5815)
6 months ago
botbw 3bcbba9262
[gemini] quick fix on possible async operation (#5803)
6 months ago
Haze188 d9dddf574f
[Gemini] Use async stream to prefetch and h2d data moving (#5781)
6 months ago
Li Xingjian 8554585a5f
[Inference] Fix flash-attn import and add model test (#5794)
6 months ago
Hongxin Liu aa125bcc91
[shardformer] fix modeling of bloom and falcon (#5796)
6 months ago
Runyu Lu c0948aff97
[Inference]refactor baichuan (#5791)
6 months ago
char-1ee f5981e808e Remove flash attention backend
6 months ago
char-1ee ceba662d22 Clean up
6 months ago
char-1ee 5f398fc000 Pass inference model shard configs for module init
6 months ago
char-1ee eec77e5702 Fix tests and naming
6 months ago
char-1ee 04386d9eff Refactor modeling by adding attention backend
6 months ago