Commit Graph

36 Commits (785cd9a9c971aa58e6f8c76575111a4aa4d9513b)

Author SHA1 Message Date
Hongxin Liu 070df689e6
[devops] fix extention building (#5427) 2024-03-05 15:35:54 +08:00
littsk 1e0e080837
[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713)
* Fix the version check bug in colossalai run when generating the cmd.

* polish code
2023-09-22 10:50:47 +08:00
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743)
* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci
2023-09-18 16:31:06 +08:00
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671)
* [legacy] move communication to legacy (#4640)

* [legacy] refactor logger and clean up legacy codes (#4654)

* [legacy] make logger independent to gpc

* [legacy] make optim independent to registry

* [legacy] move test engine to legacy

* [legacy] move nn to legacy (#4656)

* [legacy] move nn to legacy

* [checkpointio] fix save hf config

* [test] remove useledd rpc pp test

* [legacy] fix nn init

* [example] skip tutorial hybriad parallel example

* [devops] test doc check

* [devops] test doc check
2023-09-11 16:24:28 +08:00
Hongxin Liu 0b00def881
[example] add llama2 example (#4527)
* [example] transfer llama-1 example

* [example] fit llama-2

* [example] refactor scripts folder

* [example] fit new gemini plugin

* [cli] fix multinode runner

* [example] fit gemini optim checkpoint

* [example] refactor scripts

* [example] update requirements

* [example] update requirements

* [example] rename llama to llama2

* [example] update readme and pretrain script

* [example] refactor scripts
2023-08-28 17:59:11 +08:00
LuGY 03654c0ce2
fix localhost measurement (#4320) 2023-08-01 10:14:00 +08:00
ocd_with_naming 85774f0c1f [NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) 2023-07-26 14:12:57 +08:00
Hongxin Liu 1908caad38
[cli] hotfix launch command for multi-nodes (#4165) 2023-07-04 17:54:40 +08:00
Liu Ziming 8065cc5fba
Modify torch version requirement to adapt torch 2.0 (#3896) 2023-06-05 15:57:35 +08:00
digger yu 70c8cdecf4
[nfc] fix typo colossalai/cli fx kernel (#3847)
* fix typo colossalai/autochunk auto_parallel amp

* fix typo colossalai/auto_parallel nn utils etc.

* fix typo colossalai/auto_parallel autochunk fx/passes  etc.

* fix typo docs/

* change placememt_policy to placement_policy in docs/ and examples/

* fix typo colossalai/ applications/

* fix typo colossalai/cli fx kernel
2023-06-02 15:02:45 +08:00
digger-yu ad6460cf2c
[NFC] fix typo applications/ and colossalai/ (#3735) 2023-05-15 11:46:25 +08:00
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618)
* Fixed several spelling errors under colossalai

* Fix the spelling error in colossalai and docs directory

* Cautious Changed the spelling error under the example folder

* Update runtime_preparation_pass.py

revert autograft to autograd

* Update search_chunk.py

utile to until

* Update check_installation.py

change misteach to mismatch in line 91

* Update 1D_tensor_parallel.md

revert to perceptron

* Update 2D_tensor_parallel.md

revert to perceptron in line 73

* Update 2p5D_tensor_parallel.md

revert to perceptron in line 71

* Update 3D_tensor_parallel.md

revert to perceptron in line 80

* Update README.md

revert to resnet in line 42

* Update reorder_graph.py

revert to indice in line 7

* Update p2p.py

revert to megatron in line 94

* Update initialize.py

revert to torchrun in line 198

* Update routers.py

change to detailed in line 63

* Update routers.py

change to detailed in line 146

* Update README.md

revert  random number in line 402
2023-04-26 11:38:43 +08:00
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
* [test] added spawn decorator

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-04-06 14:51:35 +08:00
Ziheng Qin 1bed38ef37 [NFC] polish colossalai/cli/benchmark/models.py code style (#3290) 2023-03-29 15:22:21 +08:00
Frank Lee 935346430f
[cli] handled version check exceptions (#2848)
* [cli] handled version check exceptions

* polish code
2023-02-21 17:04:49 +08:00
Wangbo Zhao(黑色枷锁) 8331420520
[NFC] polish colossalai/cli/cli.py code style (#2734) 2023-02-15 22:25:28 +08:00
Zihao b3d10db5f1
[NFC] polish colossalai/cli/launcher/__init__.py code style (#2709) 2023-02-15 09:57:22 +08:00
Frank Lee 14d9299360
[cli] fixed hostname mismatch error (#2465) 2023-01-12 14:52:09 +08:00
Frank Lee 39163417a1
[example] updated the hybrid parallel tutorial (#2444)
* [example] updated the hybrid parallel tutorial

* polish code
2023-01-11 15:17:17 +08:00
Frank Lee c72c827e95
[cli] provided more details if colossalai run fail (#2442) 2023-01-11 13:56:42 +08:00
Frank Lee ce08661eb1
[cli] updated installation check cli for aot/jit build (#2395) 2023-01-09 11:05:27 +08:00
Junming Wu 4a79c10750 [NFC] polish colossalai/cli/benchmark/__init__.py code style (#2308) 2023-01-04 15:09:57 +08:00
アマデウス 49715a78f0 [NFC] polish colossalai/cli/benchmark/benchmark.py code style (#2287) 2023-01-04 15:09:57 +08:00
Frank Lee ea74a3b9cc
[cli] updated installation cheheck with more inforamtion (#2050)
* [cli] updated installation cheheck with more inforamtion

* polish code

* polish code
2022-11-30 17:53:55 +08:00
ver217 f8a7148dec
[kernel] move all symlinks of kernel to `colossalai._C` (#1971) 2022-11-17 13:42:33 +08:00
YuliangLiu0306 d182b0bd47
[hotfix] fix some bugs caused by size mismatch. (#1011)
* [CLI] add CLI launcher

* Revert "[CLI] add CLI launcher"

This reverts commit df7e6506d4.

* [hotfix]fix some bugs caused by size mismatch.

* add warning logs

* polish
2022-05-23 14:02:28 +08:00
Frank Lee 1467d83edf
[cli] remove unused imports (#1001) 2022-05-18 23:27:18 +08:00
Frank Lee a82da26f7e
[cli] refactored micro-benchmarking cli and added more metrics (#858) 2022-04-25 11:48:07 +08:00
Frank Lee cf6d1c9284
[CLI] refactored the launch CLI and fixed bugs in multi-node launching (#844)
* [cli] fixed multi-node job launching

* [cli] fixed a bug in version comparison

* [cli] support launching with env var

* [cli] fixed multi-node job launching

* [cli] fixed a bug in version comparison

* [cli] support launching with env var

* added docstring

* [cli] added extra launch arguments

* [cli] added default launch rdzv args

* [cli] fixed version comparison

* [cli] added docstring examples and requierment

* polish docstring

* polish code

* polish code
2022-04-24 13:26:26 +08:00
FrankLeeeee 70ed11d07e [cli] added check installation cli 2022-04-20 12:13:27 +08:00
FrankLeeeee d522cb704e [cli] fixed single-node process launching 2022-04-20 10:46:51 +08:00
FrankLeeeee f63e91d280 [cli] fixed a bug in user args and refactored the module structure 2022-04-19 15:15:16 +08:00
Frank Lee 05d9ae5999
[cli] add missing requirement (#805) 2022-04-19 13:56:59 +08:00
YuliangLiu0306 de2f581d43
[cli] added micro benchmarking for tp (#789)
* [CLI] add CLI launcher

* Revert "[CLI] add CLI launcher"

This reverts commit df7e6506d4.

* [CLI]add cli benchmark feature

* fix CodeFactor issues.

* refactor the module structure.
2022-04-19 12:08:28 +08:00
YuliangLiu0306 cfadc9df8e
[cli] added distributed launcher command (#791)
* [CLI] add CLI launcher

* Revert "[CLI] add CLI launcher"

This reverts commit df7e6506d4.

* [CLI]add cli launcher feature

* remove testing message used during developing

* refactor the module structure.
2022-04-19 10:59:44 +08:00