Commit Graph

161 Commits (c7d68b2c2ca3f7fd32056ea952fae4fe239f75ea)

Author SHA1 Message Date
digger yu b0b53a171c
[nfc] fix typo colossalai/shardformer/ (#5133)
11 months ago
flybird11111 451e9142b8
fix flash attn (#5209)
11 months ago
flybird11111 02d2328a04
support linear accumulation fusion (#5199)
11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
flybird11111 79718fae04
[shardformer] llama support DistCrossEntropy (#5176)
12 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
アマデウス 126cf180bc
[hotfix] fixed memory usage of shardformer module replacement (#5122)
1 year ago
Xuanlei Zhao 68fcaa2225
remove duplicate import (#5100)
1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090)
1 year ago
flybird11111 aae496631c
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084)
1 year ago
Zhongkai Zhao 75af66cd81
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064)
1 year ago
Bin Jia 4e3959d316
[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074)
1 year ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067)
1 year ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057)
1 year ago
flybird11111 97cd0cd559
[shardformer] fix llama error when transformers upgraded. (#5055)
1 year ago
Elsa Granger b2ad0d9e8f
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017)
1 year ago
Zhongkai Zhao 70885d707d
[hotfix] Suport extra_kwargs in ShardConfig (#5031)
1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942)
1 year ago
Jianghai ef4c14a5e2
[Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014)
1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
1 year ago
Bin Jia 1db6727678
[Pipeline inference] Combine kvcache with pipeline inference (#4938)
1 year ago
digger yu 11009103be
[nfc] fix some typo with colossalai/ docs/ etc. (#4920)
1 year ago
Hongxin Liu 1f5d2e8062
[hotfix] fix torch 2.0 compatibility (#4936)
1 year ago
Xu Kai d1fcc0fa4d
[infer] fix test bug (#4838)
1 year ago
littsk 11f1e426fe
[hotfix] Correct several erroneous code comments (#4794)
1 year ago
Jianghai ce7ade3882
[inference] chatglm2 infer demo (#4724)
1 year ago
Xu Kai 946ab56c48
[feature] add gptq for inference (#4754)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Baizhou Zhang f911d5b09d
[doc] Add user document for Shardformer (#4702)
1 year ago
flybird11111 20190b49a5
[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710)
1 year ago
flybird11111 c7d6975d29
[shardformer] fix GPT2DoubleHeadsModel (#4703)
1 year ago
flybird11111 8844691f4b
[shardformer] update shardformer readme (#4689)
1 year ago
Cuiqing Li bce0f16702
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577)
1 year ago
flybird11111 eedaa3e1ef
[shardformer]fix gpt2 double head (#4663)
1 year ago
flybird11111 7486ed7d3a
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645)
1 year ago
Baizhou Zhang 295b38fecf
[example] update vit example for hybrid parallel plugin (#4641)
1 year ago
eric8607242 c3d5fa3bac
[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624)
1 year ago
flybird11111 ec0866804c
[shardformer] update shardformer readme (#4617)
1 year ago
Bin Jia 86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin (#4615)
1 year ago
Jianghai 24c0768795
[shardformer] Pytree fix (#4533)
1 year ago
Baizhou Zhang 2c787d7f47
[shardformer] fix submodule replacement bug when enabling pp (#4544)
1 year ago
flybird11111 d367b88785
[shardformer] fix opt test hanging (#4521)
1 year ago
Bin Jia e241b74f24
[shardformer] Add overlap support for gpt2 (#4535)
1 year ago
Bin Jia c554b7f559
[shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
1 year ago
Baizhou Zhang 44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506)
1 year ago
flybird11111 de8a65babc
[shardformer] opt fix. (#4514)
1 year ago
flybird11111 3353e55c80
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498)
1 year ago
flybird11111 59e252ecdb
[shardformer] chatglm support sequence parallel (#4482)
1 year ago
Bin Jia 351351a36e
[shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488)
1 year ago
Jianghai 5545114fd8
rename chatglm to chatglm2 (#4484)
1 year ago