littsk
|
1a3315e336
|
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
* [hotfix] Add layer norm gradients all-reduce for sequence parallel. (#4915)
* Add layer norm gradients all-reduce for sequence parallel.
* skip pipeline inference test
* [hotfix] fixing polices of sequence parallel (#4922)
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
---------
Co-authored-by: littsk <1214689160@qq.com>
* Hotfix/add grad all reduce for sequence parallel (#4927)
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
* fix bug using wrong variables
---------
Co-authored-by: littsk <1214689160@qq.com>
* fix policy initialization
* fix bloom and chatglm policices
* polish code of handling layernorm
* fix moe module
* polish code of class initializing
---------
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
|
2023-11-03 13:32:43 +08:00 |
Hongxin Liu
|
079bf3cb26
|
[misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
|
2023-09-19 14:20:26 +08:00 |
Baizhou Zhang
|
2c787d7f47
|
[shardformer] fix submodule replacement bug when enabling pp (#4544)
|
2023-08-31 09:57:18 +08:00 |
flybird11111
|
ec18fc7340
|
[shardformer] support pp+tp+zero1 tests (#4531)
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
|
2023-08-30 21:29:18 +08:00 |
flybird11111
|
d367b88785
|
[shardformer] fix opt test hanging (#4521)
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
|
2023-08-30 14:50:34 +08:00 |
Jianghai
|
e04436a82a
|
[shardformer] tests for 3d parallel (#4493)
|
2023-08-23 15:05:24 +08:00 |
Jianghai
|
5545114fd8
|
rename chatglm to chatglm2 (#4484)
|
2023-08-22 14:13:31 +08:00 |