ver217
5b1854309a
[hotfix] fix zero ddp warmup check ( #2545 )
2 years ago
HELSON
a4ed9125ac
[hotfix] fix lightning error ( #2529 )
2 years ago
HELSON
66dfcf5281
[gemini] update the gpt example ( #2527 )
2 years ago
HELSON
b528eea0f0
[zero] add zero wrappers ( #2523 )
...
* [zero] add zero wrappers
* change names
* add wrapper functions to init
2 years ago
HELSON
707b11d4a0
[gemini] update ddp strict mode ( #2518 )
...
* [zero] add strict ddp mode for chunk init
* [gemini] update gpt example
2 years ago
HELSON
2d1a7dfe5f
[zero] add strict ddp mode ( #2508 )
...
* [zero] add strict ddp mode
* [polish] add comments for strict ddp mode
* [zero] fix test error
2 years ago
HELSON
2bfeb24308
[zero] add warning for ignored parameters ( #2446 )
2 years ago
HELSON
5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters ( #2443 )
...
* [ddp] add is_ddp_ignored
[ddp] rename to is_ddp_ignored
* [zero] fix state_dict and load_state_dict
* fix bugs
* [zero] update unit test for ZeroDDP
2 years ago
HELSON
7829aa094e
[ddp] add is_ddp_ignored ( #2434 )
...
[ddp] rename to is_ddp_ignored
2 years ago
HELSON
bb4e9a311a
[zero] add inference mode and its unit test ( #2418 )
2 years ago
HELSON
dddacd2d2c
[hotfix] add norm clearing for the overflow step ( #2416 )
2 years ago
HELSON
ea13a201bb
[polish] polish code for get_static_torch_model ( #2405 )
...
* [gemini] polish code
* [testing] remove code
* [gemini] make more robust
2 years ago
Frank Lee
551cafec14
[doc] updated kernel-related optimisers' docstring ( #2385 )
...
* [doc] updated kernel-related optimisers' docstring
* polish doc
2 years ago
eric8607242
9880fd2cd8
Fix state_dict key missing issue of the ZeroDDP ( #2363 )
...
* Fix state_dict output for ZeroDDP duplicated parameters
* Rewrite state_dict based on get_static_torch_model
* Modify get_static_torch_model to be compatible with the lower version (ZeroDDP)
2 years ago
Frank Lee
40d376c566
[setup] support pre-build and jit-build of cuda kernels ( #2374 )
...
* [setup] support pre-build and jit-build of cuda kernels
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2 years ago
HELSON
48d33b1b17
[gemini] add get static torch model ( #2356 )
2 years ago
Jiarui Fang
16cc8e6aa7
[builder] MOE builder ( #2277 )
2 years ago
zbian
e94c79f15b
improved allgather & reducescatter for 3d
2 years ago
Jiarui Fang
af32022f74
[Gemini] fix the convert_to_torch_module bug ( #2269 )
2 years ago
HELSON
2458659919
[zero] fix error for BEiT models ( #2169 )
...
* [zero] fix error for BEiT models
* [ColoParameter] add unpack operation for tuple arguments
* fix bugs
* fix chunkv2 unit testing
* add assertion for gradient state
2 years ago
Jiarui Fang
355ffb386e
[builder] unified cpu_optim fused_optim inferface ( #2190 )
2 years ago
Jiarui Fang
9587b080ba
[builder] use runtime builder for fused_optim ( #2189 )
2 years ago
Jiarui Fang
d42afd30f8
[builder] runtime adam and fused_optim builder ( #2184 )
2 years ago
Tongping Liu
ab54fed292
[hotfix] add kwargs for colo_addmm ( #2171 )
2 years ago
アマデウス
622f863291
[hotfix] Jit type hint #2161 ( #2164 )
2 years ago
Jiarui Fang
2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. ( #2151 )
2 years ago
Jiarui Fang
bdef9dfdbe
[NFC] remove useless graph node code ( #2150 )
2 years ago
Jiarui Fang
9214d1fe28
[Gemini] chunk init using runtime visited param order ( #2115 )
2 years ago
HELSON
e7d3afc9cc
[optimizer] add div_scale for optimizers ( #2117 )
...
* [optimizer] add div_scale for optimizers
* [zero] use div_scale in zero optimizer
* fix testing error
2 years ago
Jiarui Fang
e5aa8333e4
[NFC] update chunk manager API ( #2119 )
2 years ago
Jiarui Fang
e99edfcb51
[NFC] polish comments for Chunk class ( #2116 )
2 years ago
HELSON
63fbba3c19
[zero] add L2 gradient clipping for ZeRO ( #2112 )
...
* [zero] add L2 gradient clipping
* [testing] add MlpModel
* [zero] add unit test for grad clipping
* fix atol
2 years ago
Jiarui Fang
1f99205827
[Gemini] remove static tracer ( #2083 )
2 years ago
Jiarui Fang
b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook ( #2080 )
2 years ago
HELSON
e37f3db40c
[gemini] add arguments ( #2046 )
...
* [zero] fix testing parameters
* [gemini] add arguments
* add docstrings
2 years ago
Jiarui Fang
96134e7be3
[hotfix] add bert test for gemini fwd bwd ( #2035 )
2 years ago
Jiarui Fang
8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor ( #2003 )
2 years ago
Jiarui Fang
a2d3266648
[hotfix] make Gemini work for conv DNN ( #1998 )
2 years ago
Jiarui Fang
cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook ( #1972 )
2 years ago
ver217
f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` ( #1971 )
2 years ago
Jiarui Fang
f7e276fa71
[Gemini] add GeminiAdamOptimizer ( #1960 )
2 years ago
アマデウス
e52f9d9109
[tensorparallel] fixed tp layers ( #1938 )
2 years ago
Jiarui Fang
986f8cbaa7
[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 ( #1876 )
2 years ago
Jiarui Fang
c2947dadf1
[inference] streaming Linear 1D Row inference ( #1874 )
2 years ago
zbian
653b0a620e
added skip_bias_add for non-tp linear
2 years ago
アマデウス
4268ae017b
[kernel] added jit warmup ( #1792 )
2 years ago
Jiarui Fang
cd5a0d56fa
[Gemini] make gemini usage simple ( #1821 )
2 years ago
Zihao
20e255d4e8
MemStatsCollectorStatic ( #1765 )
2 years ago
HELSON
c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 ( #1786 )
...
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12
* [zero] add cpu shard init
* [zero] add tiny example test
* [colo_tensor] fix bugs for torch-1.11
2 years ago
kurisusnowdeng
0b8161fab8
updated tp layers
2 years ago