Commit Graph

365 Commits (6a3086a5055235e51a1bca8a20c4bd967409a259)

Author SHA1 Message Date
Frank Lee 8823cc4831
Merge pull request #5310 from hpcaitech/feature/npu
10 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298)
10 months ago
digger yu bce9499ed3
fix some typo (#5307)
10 months ago
flybird11111 f7e3f82a7e
fix llama pretrain (#5287)
10 months ago
ver217 148469348a Merge branch 'main' into sync/npu
11 months ago
Wenhao Chen ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg (#5268)
11 months ago
binmakeswell c174c4fc5f
[doc] fix doc typo (#5256)
11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239)
11 months ago
Xuanlei Zhao dd2c28a323
[npu] use extension for op builder (#5172)
11 months ago
Wenhao Chen 3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode (#5201)
11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
flybird11111 21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150)
12 months ago
binmakeswell 177c79f2d1
[doc] add moe news (#5128)
1 year ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
digger yu d5661f0f25
[nfc] fix typo change directoty to directory (#5111)
1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090)
1 year ago
flybird11111 aae496631c
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084)
1 year ago
Hongxin Liu 1cd7efc520
[inference] refactor examples and fix schedule (#5077)
1 year ago
Bin Jia 4e3959d316
[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074)
1 year ago
github-actions[bot] 8921a73c90
[format] applied code formatting on changed files in pull request 5067 (#5072)
1 year ago
Xu Kai fb103cfd6e
[inference] update examples and engine (#5073)
1 year ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067)
1 year ago
Cuiqing Li (李崔卿) bce919708f
[Kernels]added flash-decoidng of triton (#5063)
1 year ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057)
1 year ago
flybird11111 bc09b95f50
[exampe] fix llama example' loss error when using gemini plugin (#5060)
1 year ago
Elsa Granger b2ad0d9e8f
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017)
1 year ago
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046)
1 year ago
Zhongkai Zhao 70885d707d
[hotfix] Suport extra_kwargs in ShardConfig (#5031)
1 year ago
Wenhao Chen 724441279b
[moe]: fix ep/tp tests, add hierarchical all2all (#4982)
1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015)
1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978)
1 year ago
Cuiqing Li 459a88c806
[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965)
1 year ago
アマデウス 4e4a10c97d
updated c++17 compiler flags (#4983)
1 year ago
Jianghai c6cd629e7a
[Inference]ADD Bench Chatglm2 script (#4963)
1 year ago
Xu Kai 785802e809
[inference] add reference and fix some bugs (#4937)
1 year ago
Cuiqing Li 3a41e8304e
[Refactor] Integrated some lightllm kernels into token-attention (#4946)
1 year ago
Xu Kai 611a5a80ca
[inference] Add smmoothquant for llama (#4904)
1 year ago
Blagoy Simandoff 8aed02b957
[nfc] fix minor typo in README (#4846)
1 year ago
Xu Kai d1fcc0fa4d
[infer] fix test bug (#4838)
1 year ago
Jianghai 013a4bedf0
[inference]fix import bug and delete down useless init (#4830)
1 year ago
Yuanheng Zhao 573f270537
[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841)
1 year ago
Yuanheng Zhao 3a74eb4b3a
[Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771)
1 year ago
binmakeswell 822051d888
[doc] update slack link (#4823)
1 year ago
flybird11111 26cd6d850c
[fix] fix weekly runing example (#4787)
1 year ago
Xu Kai 946ab56c48
[feature] add gptq for inference (#4754)
1 year ago
Baizhou Zhang df66741f77
[bug] fix get_default_parser in examples (#4764)
1 year ago
Wenhao Chen 7b9b86441f
[chat]: update rm, add wandb and fix bugs (#4471)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
github-actions[bot] 3c6b831c26
[format] applied code formatting on changed files in pull request 4743 (#4750)
1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743)
1 year ago