littsk
83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin ( #4837 )
...
* Add clip_grad_norm for hibrid_parallel_plugin
* polish code
* add unittests
* Move tp to a higher-level optimizer interface.
* bug fix
* polish code
2023-10-12 11:32:37 +08:00
littsk
11f1e426fe
[hotfix] Correct several erroneous code comments ( #4794 )
2023-09-27 10:43:03 +08:00
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
LuGY
839847b7d7
[zero]support zero2 with gradient accumulation ( #4511 )
...
* support gradient accumulation with zero2
* fix type
2023-08-25 13:44:07 +08:00
LuGY
d86ddd9b29
[hotfix] fix unsafe async comm in zero ( #4404 )
...
* improve stablility of zero
* fix wrong index
* add record stream
2023-08-11 15:09:24 +08:00
LuGY
c6ab96983a
[zero] refactor low level zero for shard evenly ( #4030 )
...
* refactor low level zero
* fix zero2 and support cpu offload
* avg gradient and modify unit test
* refactor grad store, support layer drop
* refactor bucket store, support grad accumulation
* fix and update unit test of zero and ddp
* compatible with tp, ga and unit test
* fix memory leak and polish
* add zero layer drop unittest
* polish code
* fix import err in unit test
* support diffenert comm dtype, modify docstring style
* polish code
* test padding and fix
* fix unit test of low level zero
* fix pad recording in bucket store
* support some models
* polish
2023-07-31 22:13:29 +08:00
YH
a22407cc02
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. ( #3173 )
...
* Fix confusing variable name in zero opt
* Apply lint
* Fix util func
* Fix minor util func
* Fix zero param optimizer name
2023-04-27 18:43:14 +08:00
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ( #3424 )
...
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
2023-04-04 13:48:16 +08:00