Commit Graph

205 Commits (89a9a600bc4802c912b0ed48d48f70bbcdd8142b)

Author SHA1 Message Date
Hongxin Liu ccabcf6485
[fp8] support fp8 amp for hybrid parallel plugin (#5975)
4 months ago
flybird11111 0c10afd372
[FP8] rebase main (#5963)
4 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821)
5 months ago
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666)
7 months ago
flybird11111 a0ad587c24
[shardformer] refactor embedding resize (#5603)
7 months ago
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566)
8 months ago
Edenzzzz 61da3fbc52 fixed layout converter caching and updated tester
8 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309)
10 months ago
Hongxin Liu 2dd01e3a14
[gemini] fix param op hook when output is tuple (#5355)
10 months ago
digger yu bce9499ed3
fix some typo (#5307)
10 months ago
flybird11111 3dbbf83f1c
fix (#5158)
12 months ago
アマデウス 126cf180bc
[hotfix] fixed memory usage of shardformer module replacement (#5122)
1 year ago
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032)
1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942)
1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015)
1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978)
1 year ago
littsk be82b5d4ca
[hotfix] Fix the bug where process groups were not being properly released. (#4940)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743)
1 year ago
digger yu 9c2feb2f0b
fix some typo with colossalai/device colossalai/tensor/ etc. (#4171)
1 year ago
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671)
1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479)
1 year ago
Baizhou Zhang 0ceec8f9a9 [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354)
1 year ago
Hongxin Liu d921ce8391 [shardformer] support inplace sharding (#4251)
1 year ago
Frank Lee 190a6ea9c2
[dtensor] fixed readme file name and removed deprecated file (#4162)
1 year ago
Frank Lee c4b1b65931 [test] fixed tests failed due to dtensor change (#4082)
1 year ago
Frank Lee 70c58cfd4f [shardformer] supported fused qkv checkpoint (#4073)
1 year ago
Frank Lee 8eb09a4c69 [shardformer] support module saving and loading (#4062)
1 year ago
Frank Lee 45d9384346 [shardformer] removed inplace tensor sharding (#4018)
1 year ago
Frank Lee 015af592f8 [shardformer] integrated linear 1D with dtensor (#3996)
1 year ago
FoolPlayer a2f9af810d [shardformer] fix an error in readme (#3988)
1 year ago
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
1 year ago
Frank Lee eb39154d40
[dtensor] updated api and doc (#3845)
1 year ago
Frank Lee d51e83d642
Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop
1 year ago
digger yu 0e484e6201
[nfc]fix typo colossalai/pipeline tensor nn (#3899)
1 year ago
Hongxin Liu 7c9f2ed6dd
[dtensor] polish sharding spec docstring (#3838)
2 years ago
YH 2629f9717d
[tensor] Refactor handle_trans_spec in DistSpecManager
2 years ago
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618)
2 years ago
YH 8f740deb53
Fix typo (#3448)
2 years ago
YH 1a229045af
Add interface for colo tesnor dp size (#3227)
2 years ago
YuliangLiu0306 258b43317c
[hotfix] layout converting issue (#3188)
2 years ago
YuliangLiu0306 2eca4cd376
[DTensor] refactor dtensor with new components (#3089)
2 years ago
YuliangLiu0306 8e4e8601b7
[DTensor] implement layout converter (#3055)
2 years ago
YuliangLiu0306 29386a54e6
[DTensor] refactor CommSpec (#3034)
2 years ago
YuliangLiu0306 cd2b0eaa8d
[DTensor] refactor sharding spec (#2987)
2 years ago
YuliangLiu0306 e414e4092b
[DTensor] implementation of dtensor (#2946)
2 years ago
YuliangLiu0306 47fb214b3b
[hotfix] add shard dim to aviod backward communication error (#2954)
2 years ago
Jiatong (Julius) Han 8c8a39be95
[hotfix]: Remove math.prod dependency (#2837)
2 years ago
HELSON 552183bb74
[polish] polish ColoTensor and its submodules (#2537)
2 years ago