Commit Graph

3490 Commits (597b2060013045cf0d0f0f8fddfc1b77ef716818)
 

Author SHA1 Message Date
flybird11111 597b206001
[fp8] support asynchronous FP8 communication (#5997)
3 months ago
Hongxin Liu 0978080a69
[fp8] refactor fp8 linear with compile (#5993)
3 months ago
Wang Binluo b2483c8e31
[fp8] support hybrid parallel plugin (#5982)
3 months ago
flybird11111 f1a3a326c4
[fp8]Moe support fp8 communication (#5977)
4 months ago
botbw e4aadeee20
[fp8] use torch compile (torch >= 2.3.0) (#5979)
4 months ago
Hongxin Liu 8241c0c054
[fp8] support gemini plugin (#5978)
4 months ago
flybird11111 4b9bec8176
[test ci]Feature/fp8 comm (#5981)
4 months ago
Hanks b480eec738
[Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928)
4 months ago
flybird11111 7739629b9d
fix (#5976)
4 months ago
Hongxin Liu ccabcf6485
[fp8] support fp8 amp for hybrid parallel plugin (#5975)
4 months ago
Hongxin Liu 76ea16466f
[fp8] add fp8 linear (#5967)
4 months ago
flybird11111 afb26de873
[fp8]support all2all fp8 (#5953)
4 months ago
flybird11111 0c10afd372
[FP8] rebase main (#5963)
4 months ago
Guangyao Zhang 53cb9606bd
[Feature] llama shardformer fp8 support (#5938)
4 months ago
Hanks c297e21bea
Merge pull request #5961 from ver217/feature/zeor-fp8
4 months ago
ver217 91e596d017 [test] add zero fp8 test case
4 months ago
ver217 ae486ce005 [fp8] add fp8 comm for low level zero
4 months ago
Hongxin Liu 5fd0592767
[fp8] support all-gather flat tensor (#5932)
4 months ago
Guangyao Zhang 62661cde22
Merge pull request #5921 from BurkeHulk/fp8_fix
4 months ago
GuangyaoZhang 5b969fd831 fix shardformer fp8 communication training degradation
4 months ago
Guangyao Zhang d0bdb51f48
Merge pull request #5899 from BurkeHulk/SP_fp8
4 months ago
GuangyaoZhang 6a20f07b80 remove all to all
4 months ago
GuangyaoZhang 5a310b9ee1 fix rebase
4 months ago
GuangyaoZhang 457a0de79f shardformer fp8
4 months ago
Hanks 9470701110
Merge pull request #5885 from BurkeHulk/feature/fp8_comm
4 months ago
pre-commit-ci[bot] 51f916b11d [pre-commit.ci] auto fixes from pre-commit.com hooks
5 months ago
BurkeHulk 1f1b856354 Merge remote-tracking branch 'origin/feature/fp8_comm' into feature/fp8_comm
5 months ago
BurkeHulk 66018749f3 add fp8_communication flag in the script
5 months ago
BurkeHulk e88190184a support fp8 communication in pipeline parallelism
5 months ago
BurkeHulk 1e1959467e fix scaling algorithm in FP8 casting
5 months ago
GuangyaoZhang dbfa7d39fc fix typo
5 months ago
pre-commit-ci[bot] e17f835df7 [pre-commit.ci] auto fixes from pre-commit.com hooks
5 months ago
Hanks 6991819a97
Merge branch 'hpcaitech:main' into feature/fp8_comm
5 months ago
pre-commit-ci[bot] 7997683aac
[pre-commit.ci] pre-commit autoupdate (#5878)
5 months ago
Hongxin Liu 7afbc81d62
[quant] fix bitsandbytes version check (#5882)
5 months ago
Wang Binluo 6cd4c32be4
[shardformer] fix the moe (#5883)
5 months ago
Edenzzzz eb24fcd914
[Hotfix] Fix OPT gradient checkpointing forward
5 months ago
Haze188 ea94c07b95
[hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879)
5 months ago
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
5 months ago
Edenzzzz 936d0b0f7b
[doc] Update llama + sp compatibility; fix dist optim table
5 months ago
Jianghai 8ab46b4000
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874)
5 months ago
HangXu f5a52e1600
fp8 operators for compressed communication
5 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821)
5 months ago
flybird11111 773d9f964a
[shardformer]delete xformers (#5859)
5 months ago
Hongxin Liu eaea88cf9e
[release] update version (#5864)
5 months ago
Runyu Lu 3c7cda0c9a
[Inference]Lazy Init Support (#5785)
5 months ago
Guangyao Zhang d9d5e7ea1f
[shardformer] Support the T5ForTokenClassification model (#5816)
5 months ago
Hongxin Liu 5dfbcd7746
[zero] use bucket during allgather (#5860)
5 months ago
botbw 8e718a1421
[gemini] fixes for benchmarking (#5847)
5 months ago
Edenzzzz 2a25a2aff7
[Feature] optimize PP overlap (#5735)
5 months ago