ColossalAI/colossalai
Frank Lee cf6d1c9284
[CLI] refactored the launch CLI and fixed bugs in multi-node launching (#844)
* [cli] fixed multi-node job launching

* [cli] fixed a bug in version comparison

* [cli] support launching with env var

* [cli] fixed multi-node job launching

* [cli] fixed a bug in version comparison

* [cli] support launching with env var

* added docstring

* [cli] added extra launch arguments

* [cli] added default launch rdzv args

* [cli] fixed version comparison

* [cli] added docstring examples and requierment

* polish docstring

* polish code

* polish code
2022-04-24 13:26:26 +08:00
..
amp
builder modefied the pp build for ckpt adaptation (#803) 2022-04-24 12:23:16 +08:00
cli [CLI] refactored the launch CLI and fixed bugs in multi-node launching (#844) 2022-04-24 13:26:26 +08:00
communication
context
engine
gemini [gemini] add GeminiMemoryManger (#832) 2022-04-24 13:08:48 +08:00
kernel
logging
nn [gemini] add GeminiMemoryManger (#832) 2022-04-24 13:08:48 +08:00
registry [dependency] removed torchvision (#833) 2022-04-22 15:24:35 +08:00
tensor [hotfix] the bug of numel() in ColoTensor (#845) 2022-04-24 12:32:10 +08:00
testing
trainer
utils [pipelinable]use pipelinable context to initialize non-pipeline model (#816) 2022-04-24 13:03:12 +08:00
zero [gemini] add GeminiMemoryManger (#832) 2022-04-24 13:08:48 +08:00
__init__.py
constants.py
core.py
global_variables.py
initialize.py modefied the pp build for ckpt adaptation (#803) 2022-04-24 12:23:16 +08:00