235 Commits (457a0de79fd2d3602eba0ac78e606acb6401fc60)

Author SHA1 Message Date
GuangyaoZhang 457a0de79f shardformer fp8 4 months ago
Wang Binluo 6cd4c32be4
[shardformer] fix the moe (#5883) 5 months ago
Edenzzzz eb24fcd914
[Hotfix] Fix OPT gradient checkpointing forward 5 months ago
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572) 5 months ago
Jianghai 8ab46b4000
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) 5 months ago
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821) 5 months ago
flybird11111 773d9f964a
[shardformer]delete xformers (#5859) 5 months ago
Runyu Lu 3c7cda0c9a
[Inference]Lazy Init Support (#5785) 5 months ago
Guangyao Zhang d9d5e7ea1f
[shardformer] Support the T5ForTokenClassification model (#5816) 5 months ago
GuangyaoZhang d84d68601a change 'xxx if xxx else None' to 'xxx or None' 5 months ago
pre-commit-ci[bot] 996c65077e [pre-commit.ci] auto fixes from pre-commit.com hooks 5 months ago
GuangyaoZhang a83a2336e8 rebase master llama change 5 months ago
GuangyaoZhang 363cde6957 merge model and attention forward 5 months ago
GuangyaoZhang 7a2b08646f Remove CohereLayerNorm and use existing layernorm 5 months ago
GuangyaoZhang fe2e74c03a fix precommit 5 months ago
GuangyaoZhang f656d61778 change command 5 months ago
GuangyaoZhang 0b81163bc0 Copy llama to command 5 months ago
Edenzzzz 8795bb2e80
Support 4d parallel + flash attention (#5789) 5 months ago
GuangyaoZhang 3c7302ad0e merge model and attention forward 5 months ago
GuangyaoZhang 8c3f524660 Remove CohereLayerNorm and use existing layernorm 5 months ago
GuangyaoZhang 9a290ab013 fix precommit 5 months ago
pre-commit-ci[bot] 2a7fa2e7d0 [pre-commit.ci] auto fixes from pre-commit.com hooks 5 months ago
GuangyaoZhang 94fbde6055 change command 5 months ago
GuangyaoZhang 431b7bcf8f Copy llama to command 5 months ago
flybird11111 2ddf624a86
[shardformer] upgrade transformers to 4.39.3 (#5815) 5 months ago
Li Xingjian 8554585a5f
[Inference] Fix flash-attn import and add model test (#5794) 5 months ago
Hongxin Liu aa125bcc91
[shardformer] fix modeling of bloom and falcon (#5796) 5 months ago
Hongxin Liu 73e88a5553
[shardformer] fix import (#5788) 6 months ago
flybird11111 50b4c8e8cf
[hotfix] fix llama flash attention forward (#5777) 6 months ago
flybird11111 3f2be80530
fix (#5765) 6 months ago
Haze188 22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) 6 months ago
Wang Binluo 537f6a3855
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) 7 months ago
Wang Binluo a3cc68ca93
[Shardformer] Support the Qwen2 model (#5699) 7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598) 7 months ago
CjhHa1 7bbb28e48b [Inference] resolve rebase conflicts 7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396) 7 months ago
Yuanheng Zhao f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) 7 months ago
wangbluo 4e50cce26b fix the mistral model 7 months ago
wangbluo a8408b4d31 remove comment code 7 months ago
pre-commit-ci[bot] ca56b93d83 [pre-commit.ci] auto fixes from pre-commit.com hooks 7 months ago
wangbluo 108ddfb795 add parallel_output for the opt model 7 months ago
pre-commit-ci[bot] 88f057ce7c [pre-commit.ci] auto fixes from pre-commit.com hooks 7 months ago
wangbluo 2632916329 remove useless code 7 months ago
wangbluo 9efc79ef24 add parallel output for mistral model 7 months ago
Wang Binluo d3f34ee8cc
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670) 7 months ago
flybird11111 6af6d6fc9f
[shardformer] support bias_gelu_jit_fused for models (#5647) 7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
flybird11111 8b7d535977
fix gptj (#5652) 7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646) 7 months ago
Hongxin Liu bbb2c21f16
[shardformer] fix chatglm implementation (#5644) 7 months ago