2903 Commits (394221861826b8032b1bea0052f06e792467674d)
 

Author SHA1 Message Date
wangbinluo 3942218618 remove useless platform args and comment 11 months ago
wangbinluo a9b5ec8664 fix the build before load bug 11 months ago
Frank Lee 9102d655ab
[hotfix] removed unused flag (#5242) 11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
Xuanlei Zhao dd2c28a323
[npu] use extension for op builder (#5172) 11 months ago
Xuanlei Zhao d6df19bae7
[npu] support triangle attention for llama (#5130) 12 months ago
Frank Lee f4e72c9992
[accelerator] init the accelerator module (#5129) 12 months ago
Xuanlei Zhao 68fcaa2225
remove duplicate import (#5100) 1 year ago
YeAnbang e53e729d8e
[Feature] Add document retrieval QA (#5020) 1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090) 1 year ago
flybird11111 aae496631c
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) 1 year ago
Zhongkai Zhao 75af66cd81
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) 1 year ago
flybird11111 4ccb9ded7d
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) 1 year ago
digger yu 0d482302a1
[nfc] fix typo and author name (#5089) 1 year ago
digger yu fd3567e089
[nfc] fix typo in docs/ (#4972) 1 year ago
Jun Gao dce05da535
fix thrust-transform-reduce error (#5078) 1 year ago
Hongxin Liu 1cd7efc520
[inference] refactor examples and fix schedule (#5077) 1 year ago
Bin Jia 4e3959d316
[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) 1 year ago
github-actions[bot] 8921a73c90
[format] applied code formatting on changed files in pull request 5067 (#5072) 1 year ago
Xu Kai fb103cfd6e
[inference] update examples and engine (#5073) 1 year ago
Bin Jia 0c7d8bebd5
[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) 1 year ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067) 1 year ago
Hongxin Liu 8d56c9c389
[misc] remove outdated submodule (#5070) 1 year ago
Cuiqing Li (李崔卿) bce919708f
[Kernels]added flash-decoidng of triton (#5063) 1 year ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057) 1 year ago
flybird11111 bc09b95f50
[exampe] fix llama example' loss error when using gemini plugin (#5060) 1 year ago
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) 1 year ago
flybird11111 97cd0cd559
[shardformer] fix llama error when transformers upgraded. (#5055) 1 year ago
flybird11111 3e02154710
[gemini] gemini support extra-dp (#5043) 1 year ago
Elsa Granger b2ad0d9e8f
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) 1 year ago
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046) 1 year ago
Orion-Zheng 43ad0d9ef0 fix wrong EOS token in ColossalChat 1 year ago
Zhongkai Zhao 70885d707d
[hotfix] Suport extra_kwargs in ShardConfig (#5031) 1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942) 1 year ago
Jun Gao a4489384d5
[shardformer] Fix serialization error with Tensor Parallel state saving (#5018) 1 year ago
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) 1 year ago
Yuanchen 239cd92eff
Support mtbench (#5025) 1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015) 1 year ago
Hongxin Liu 67f5331754
[misc] add code owners (#5024) 1 year ago
Jianghai ef4c14a5e2
[Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) 1 year ago
github-actions[bot] c36e782d80
[format] applied code formatting on changed files in pull request 4926 (#5007) 1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) 1 year ago
Baizhou Zhang d99b2c961a
[hotfix] fix grad accumulation plus clipping for gemini (#5002) 1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago
Hongxin Liu 8993c8a817
[release] update version (#4995) 1 year ago
Bin Jia b6696beb04
[Pipeline Inference] Merge pp with tp (#4993) 1 year ago
ppt0011 335cb105e2
[doc] add supported feature diagram for hybrid parallel plugin (#4996) 1 year ago
Baizhou Zhang c040d70aa0
[hotfix] fix the bug of repeatedly storing param group (#4951) 1 year ago
littsk be82b5d4ca
[hotfix] Fix the bug where process groups were not being properly released. (#4940) 1 year ago
Cuiqing Li (李崔卿) 4f0234f236
[doc]Update doc for colossal-inference (#4989) 1 year ago