digger-yu
b9a8dff7e5
[doc] Fix typo under colossalai and doc( #3618 )
...
* Fixed several spelling errors under colossalai
* Fix the spelling error in colossalai and docs directory
* Cautious Changed the spelling error under the example folder
* Update runtime_preparation_pass.py
revert autograft to autograd
* Update search_chunk.py
utile to until
* Update check_installation.py
change misteach to mismatch in line 91
* Update 1D_tensor_parallel.md
revert to perceptron
* Update 2D_tensor_parallel.md
revert to perceptron in line 73
* Update 2p5D_tensor_parallel.md
revert to perceptron in line 71
* Update 3D_tensor_parallel.md
revert to perceptron in line 80
* Update README.md
revert to resnet in line 42
* Update reorder_graph.py
revert to indice in line 7
* Update p2p.py
revert to megatron in line 94
* Update initialize.py
revert to torchrun in line 198
* Update routers.py
change to detailed in line 63
* Update routers.py
change to detailed in line 146
* Update README.md
revert random number in line 402
2 years ago
Hongxin Liu
173dad0562
[misc] add verbose arg for zero and op builder ( #3552 )
...
* [misc] add print verbose
* [gemini] add print verbose
* [zero] add print verbose for low level
* [misc] add print verbose for op builder
2 years ago
Hongxin Liu
152239bbfa
[gemini] gemini supports lazy init ( #3379 )
...
* [gemini] fix nvme optimizer init
* [gemini] gemini supports lazy init
* [gemini] add init example
* [gemini] add fool model
* [zero] update gemini ddp
* [zero] update init example
* add chunk method
* add chunk method
* [lazyinit] fix lazy tensor tolist
* [gemini] fix buffer materialization
* [misc] remove useless file
* [booster] update gemini plugin
* [test] update gemini plugin test
* [test] fix gemini plugin test
* [gemini] fix import
* [gemini] fix import
* [lazyinit] use new metatensor
* [lazyinit] use new metatensor
* [lazyinit] fix __set__ method
2 years ago
Frank Lee
7d8d825681
[booster] fixed the torch ddp plugin with the new checkpoint api ( #3442 )
2 years ago
Frank Lee
1beb85cc25
[checkpoint] refactored the API and added safetensors support ( #3427 )
...
* [checkpoint] refactored the API and added safetensors support
* polish code
2 years ago
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ( #3424 )
...
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
2 years ago
ver217
5f2e34e6c9
[booster] implement Gemini plugin ( #3352 )
...
* [booster] add gemini plugin
* [booster] update docstr
* [booster] gemini plugin add coloparam convertor
* [booster] fix coloparam convertor
* [booster] fix gemini plugin device
* [booster] add gemini plugin test
* [booster] gemini plugin ignore sync bn
* [booster] skip some model
* [booster] skip some model
* [booster] modify test world size
* [booster] modify test world size
* [booster] skip test
2 years ago
Frank Lee
73d3e4d309
[booster] implemented the torch ddd + resnet example ( #3232 )
...
* [booster] implemented the torch ddd + resnet example
* polish code
2 years ago
Frank Lee
e7f3bed2d3
[booster] added the plugin base and torch ddp plugin ( #3180 )
...
* [booster] added the plugin base and torch ddp plugin
* polish code
* polish code
* polish code
2 years ago
Frank Lee
a9b8402d93
[booster] added the accelerator implementation ( #3159 )
2 years ago
Frank Lee
ed19290560
[booster] implemented mixed precision class ( #3151 )
...
* [booster] implemented mixed precision class
* polish code
2 years ago
Frank Lee
f19b49e164
[booster] init module structure and definition ( #3056 )
2 years ago