Commit Graph

277 Commits (fee2af861078181e18a059800ba09a4edfb311f0)

Author SHA1 Message Date
Tong Li 196d4696d0 [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) 2023-03-29 15:22:21 +08:00
Yuanchen d58fa705b2 [NFC] polish code style (#3268)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-03-29 15:22:21 +08:00
github-actions[bot] 82503a96f2
[format] applied code formatting on changed files in pull request 2997 (#3008)
Co-authored-by: github-actions <github-actions@github.com>
2023-03-06 10:42:22 +08:00
binmakeswell 52a5078988
[doc] add ISC tutorial (#2997)
* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial
2023-03-06 10:36:38 +08:00
ver217 823f3b9cf4
[doc] add deepspeed citation and copyright (#2996)
* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright
2023-03-04 20:08:11 +08:00
zbian 61e687831d fixed using zero with tp cannot access weight correctly 2023-02-28 10:52:30 +08:00
Jiatong (Julius) Han 8c8a39be95
[hotfix]: Remove math.prod dependency (#2837)
* Remove math.prod dependency

* Fix style

* Fix style

---------

Co-authored-by: Jiatong Han <jiatong.han@u.nus.edu>
2023-02-23 23:56:15 +08:00
junxu c52edcf0eb
Rename class method of ZeroDDP (#2692) 2023-02-22 15:05:53 +08:00
HELSON 56ddc9ca7a
[hotfix] add correct device for fake_param (#2796) 2023-02-17 15:29:07 +08:00
HELSON 8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671) 2023-02-13 14:35:32 +08:00
binmakeswell 9ab14b20b5
[doc] add CVPR tutorial (#2666) 2023-02-10 20:43:34 +08:00
ver217 5b1854309a
[hotfix] fix zero ddp warmup check (#2545) 2023-02-02 16:42:38 +08:00
HELSON a4ed9125ac
[hotfix] fix lightning error (#2529) 2023-01-31 10:40:39 +08:00
HELSON 66dfcf5281
[gemini] update the gpt example (#2527) 2023-01-30 17:58:05 +08:00
HELSON b528eea0f0
[zero] add zero wrappers (#2523)
* [zero] add zero wrappers

* change names

* add wrapper functions to init
2023-01-29 17:52:58 +08:00
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init

* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
HELSON 2d1a7dfe5f
[zero] add strict ddp mode (#2508)
* [zero] add strict ddp mode

* [polish] add comments for strict ddp mode

* [zero] fix test error
2023-01-20 14:04:38 +08:00
HELSON 2bfeb24308
[zero] add warning for ignored parameters (#2446) 2023-01-11 15:30:09 +08:00
HELSON 5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443)
* [ddp] add is_ddp_ignored

[ddp] rename to is_ddp_ignored

* [zero] fix state_dict and load_state_dict

* fix bugs

* [zero] update unit test for ZeroDDP
2023-01-11 14:55:41 +08:00
HELSON 7829aa094e
[ddp] add is_ddp_ignored (#2434)
[ddp] rename to is_ddp_ignored
2023-01-11 12:22:45 +08:00
HELSON bb4e9a311a
[zero] add inference mode and its unit test (#2418) 2023-01-11 10:07:37 +08:00
HELSON dddacd2d2c
[hotfix] add norm clearing for the overflow step (#2416) 2023-01-10 15:43:06 +08:00
HELSON ea13a201bb
[polish] polish code for get_static_torch_model (#2405)
* [gemini] polish code

* [testing] remove code

* [gemini] make more robust
2023-01-09 17:41:38 +08:00
Frank Lee 551cafec14
[doc] updated kernel-related optimisers' docstring (#2385)
* [doc] updated kernel-related optimisers' docstring

* polish doc
2023-01-09 17:13:53 +08:00
eric8607242 9880fd2cd8
Fix state_dict key missing issue of the ZeroDDP (#2363)
* Fix state_dict output for ZeroDDP duplicated parameters

* Rewrite state_dict based on get_static_torch_model

* Modify get_static_torch_model to be compatible with the lower version (ZeroDDP)
2023-01-09 14:35:14 +08:00
Frank Lee 40d376c566
[setup] support pre-build and jit-build of cuda kernels (#2374)
* [setup] support pre-build and jit-build of cuda kernels

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-06 20:50:26 +08:00
HELSON 48d33b1b17
[gemini] add get static torch model (#2356) 2023-01-06 13:41:19 +08:00
Jiarui Fang 16cc8e6aa7
[builder] MOE builder (#2277) 2023-01-03 20:29:39 +08:00
zbian e94c79f15b improved allgather & reducescatter for 3d 2023-01-03 17:46:08 +08:00
Jiarui Fang af32022f74
[Gemini] fix the convert_to_torch_module bug (#2269) 2023-01-03 15:55:35 +08:00
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
* [zero] fix error for BEiT models

* [ColoParameter] add unpack operation for tuple arguments

* fix bugs

* fix chunkv2 unit testing

* add assertion for gradient state
2022-12-26 15:03:54 +08:00
Jiarui Fang 355ffb386e
[builder] unified cpu_optim fused_optim inferface (#2190) 2022-12-23 20:57:41 +08:00
Jiarui Fang 9587b080ba
[builder] use runtime builder for fused_optim (#2189) 2022-12-23 17:07:03 +08:00
Jiarui Fang d42afd30f8
[builder] runtime adam and fused_optim builder (#2184) 2022-12-23 14:14:21 +08:00
Tongping Liu ab54fed292
[hotfix] add kwargs for colo_addmm (#2171) 2022-12-22 13:25:30 +08:00
アマデウス 622f863291
[hotfix] Jit type hint #2161 (#2164) 2022-12-22 10:17:03 +08:00
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151) 2022-12-20 10:19:36 +08:00
Jiarui Fang bdef9dfdbe
[NFC] remove useless graph node code (#2150) 2022-12-20 00:33:58 +08:00
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115) 2022-12-12 18:06:16 +08:00
HELSON e7d3afc9cc
[optimizer] add div_scale for optimizers (#2117)
* [optimizer] add div_scale for optimizers

* [zero] use div_scale in zero optimizer

* fix testing error
2022-12-12 17:58:57 +08:00
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119) 2022-12-12 16:57:22 +08:00
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116) 2022-12-12 15:39:31 +08:00
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
* [zero] add L2 gradient clipping

* [testing] add MlpModel

* [zero] add unit test for grad clipping

* fix atol
2022-12-09 18:09:17 +08:00
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083) 2022-12-06 12:53:58 +08:00
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080) 2022-12-05 17:11:06 +08:00
HELSON e37f3db40c
[gemini] add arguments (#2046)
* [zero] fix testing parameters

* [gemini] add arguments

* add docstrings
2022-11-30 16:40:13 +08:00
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035) 2022-11-29 11:19:52 +08:00
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) 2022-11-25 20:06:35 +08:00
Jiarui Fang a2d3266648
[hotfix] make Gemini work for conv DNN (#1998) 2022-11-22 14:52:36 +08:00
Jiarui Fang cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972) 2022-11-17 14:43:49 +08:00