duanjunwen
5b5fbcff09
[fix] fix hybridparall use_fp8 config
3 weeks ago
duanjunwen
3b5c314bea
[fix] fix fp8 args in HybridParallel
3 weeks ago
duanjunwen
c82c75a9b4
Merge branch 'feature/zerobubble' of github.com:hpcaitech/ColossalAI into dev/zero_bubble
3 weeks ago
duanjunwen
1d328ff651
Merge branch 'main' into dev/zero_bubble
3 weeks ago
pre-commit-ci[bot]
2f583c1549
[pre-commit.ci] pre-commit autoupdate ( #6078 )
...
updates:
- [github.com/psf/black-pre-commit-mirror: 24.8.0 → 24.10.0](https://github.com/psf/black-pre-commit-mirror/compare/24.8.0...24.10.0 )
- [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.2](https://github.com/pre-commit/mirrors-clang-format/compare/v18.1.8...v19.1.2 )
- [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.6.0...v5.0.0 )
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
3 weeks ago
duanjunwen
aed20fb2df
[feat] support zbv in mixtral benchmark; ( #6083 )
...
* [feat] support zbv in mixtral benchmark;
* [fix] MixtralForCausalLMPolicy get_held_layer support zbv;
* [feat] update MixtralPipelineForwards --> mixtral_model_forward; support zbv;
* [feat] support MixtralPipelineForwards--> mixtral_for_causal_lm_forward for zbv
* [fix] fix llama, mixtral benchmark zbv loss none bug; update mixtral & llama policy and modeling;
* [feat] Linear1D_COL/ROW support zbv WeightGradStore;
* [feat] support use_zbv in llama, mixtral modeling; only replace Linear1D_Col/Row policy;
* [fix] fix test case; moe error in second iter
* [feat]EPMixtralSparseMoeBlock (op in MOE) support zbv;
* [fix] fix bwd b; now bwd w only for Layer replaced by Linear1D_Col/Row; other layer perform a fully bwd;
* [fix] debug zbv llama test;
* [fix] rm use_zbv flag in Shardconfig; rm debug info;
* [fix] add & fix llama test
* [feat] support meta cache, meta_grad_send, meta_tensor_send; fix runtime too long in Recv Bwd; benchmark for llama + Hybrid(tp+pp);
* [fix\ fix fail case test_shard_llama
* [fix] fix test_shard_llama
* [fix] fix llama modeling policy;
* [fix] fix test_shard_llama ci;
* [fix] fix test zerobubble
* [fix] fix handle name; rm useless comments;
* [fix] fix send recv signature;
* [fix] fix comment in llama & benchmark
* [feat] support no tensor parallel Linear in shardformer; Add test for use weightGradStore and not use WeightGradStore
* [fix] fix linear (no tp) ops func name;
3 weeks ago
Hongxin Liu
c2e8f61592
[checkpointio] fix hybrid plugin model save ( #6106 )
3 weeks ago
duanjunwen
5f0924361d
[fix] fix linear (no tp) ops func name;
3 weeks ago
duanjunwen
d2e05a99b3
[feat] support no tensor parallel Linear in shardformer; Add test for use weightGradStore and not use WeightGradStore
3 weeks ago
duanjunwen
982e4ee1f8
[fix] fix comment in llama & benchmark
3 weeks ago
duanjunwen
fa3ccda8ee
[fix] fix send recv signature;
3 weeks ago
duanjunwen
fafe049b83
[fix] fix handle name; rm useless comments;
3 weeks ago
duanjunwen
5aee4261a6
[fix] fix test zerobubble
4 weeks ago
duanjunwen
6377aa0fff
[fix] fix test_shard_llama ci;
4 weeks ago
duanjunwen
03fa79a55c
[fix] fix llama modeling policy;
4 weeks ago
duanjunwen
cc0dfddcbc
[fix] fix test_shard_llama
4 weeks ago
duanjunwen
d0ec221b38
[fix\ fix fail case test_shard_llama
4 weeks ago
Tong Li
89a9a600bc
[MCTS] Add self-refined MCTS ( #6098 )
...
* add reasoner
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update code
* delete llama
* update prompts
* update readme
* update readme
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
4 weeks ago
duanjunwen
2eca112c90
[feat] support meta cache, meta_grad_send, meta_tensor_send; fix runtime too long in Recv Bwd; benchmark for llama + Hybrid(tp+pp);
4 weeks ago
binmakeswell
4294ae83bb
[doc] sora solution news ( #6100 )
...
* [doc] sora solution news
* [doc] sora solution news
4 weeks ago
Hongxin Liu
80a8ca916a
[extension] hotfix compile check ( #6099 )
4 weeks ago
Hanks
dee63cc5ef
Merge pull request #6096 from BurkeHulk/hotfix/lora_ckpt
...
[hotfix] fix lora ckpt saving format
1 month ago
BurkeHulk
6d6cafabe2
pre-commit fix
1 month ago
BurkeHulk
b10339df7c
fix lora ckpt save format (ColoTensor to Tensor)
1 month ago
Hongxin Liu
19baab5fd5
[release] update version ( #6094 )
1 month ago
Hongxin Liu
58d8b8a2dd
[misc] fit torch api upgradation and remove legecy import ( #6093 )
...
* [amp] fit torch's new api
* [amp] fix api call
* [amp] fix api call
* [misc] fit torch pytree api upgrade
* [misc] remove legacy import
* [misc] fit torch amp api
* [misc] fit torch amp api
1 month ago
Hongxin Liu
5ddad486ca
[fp8] add fallback and make compile option configurable ( #6092 )
1 month ago
botbw
3b1d7d1ae8
[chore] refactor
1 month ago
botbw
2bcd0b6844
[ckpt] add safetensors util
1 month ago
Hongxin Liu
cd61353bae
[pipeline] hotfix backward for multiple outputs ( #6090 )
...
* [pipeline] hotfix backward for multiple outputs
* [pipeline] hotfix backward for multiple outputs
1 month ago
duanjunwen
705b18e1e7
[fix] add & fix llama test
1 month ago
duanjunwen
e76308c6e6
[fix] rm use_zbv flag in Shardconfig; rm debug info;
1 month ago
Wenxuan Tan
62c13e7969
[Ring Attention] Improve comments ( #6085 )
...
* improve comments
* improve comments
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
1 month ago
duanjunwen
90939b77e0
[fix] debug zbv llama test;
1 month ago
Wang Binluo
dcd41d0973
Merge pull request #6071 from wangbluo/ring_attention
...
[Ring Attention] fix the 2d ring attn when using multiple machine
1 month ago
wangbluo
83cf2f84fb
fix
1 month ago
duanjunwen
52dcc73313
Merge branch 'feature/zerobubble' of github.com:hpcaitech/ColossalAI into dev/zero_bubble
1 month ago
duanjunwen
9912cc8c07
[fix] fix bwd b; now bwd w only for Layer replaced by Linear1D_Col/Row; other layer perform a fully bwd;
1 month ago
wangbluo
bc7eeade33
fix
1 month ago
wangbluo
fd92789af2
fix
1 month ago
wangbluo
6be9862aaf
fix
1 month ago
wangbluo
3dc08c8a5a
fix
1 month ago
wangbluo
8ff7d0c780
fix
1 month ago
wangbluo
fe9208feac
fix
1 month ago
wangbluo
3201377e94
fix
1 month ago
wangbluo
23199e34cc
fix
1 month ago
duanjunwen
160e9a4175
[feat]EPMixtralSparseMoeBlock (op in MOE) support zbv;
1 month ago
duanjunwen
abd455189d
[fix] fix test case; moe error in second iter
1 month ago
duanjunwen
a11b4b50a7
[feat] support use_zbv in llama, mixtral modeling; only replace Linear1D_Col/Row policy;
1 month ago
duanjunwen
cfade4c36d
[feat] Linear1D_COL/ROW support zbv WeightGradStore;
1 month ago