54 Commits (5f8c0a0ac3b52a71b664c3e36dd1a8cef40f428d)

Author SHA1 Message Date
Season 7ef91606e1
[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) 7 months ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743) 1 year ago
Hongxin Liu ac178ca5c1 [legacy] move builder and registry to legacy (#4603) 1 year ago
digger-yu b7141c36dd
[CI] fix some spelling errors (#3707) 2 years ago
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618) 2 years ago
yuxuan-lou 198a74b9fd
[NFC] polish colossalai/context/random/__init__.py code style (#3327) 2 years ago
RichardoLuo 1ce9d0c531 [NFC] polish initializer_data.py code style (#3287) 2 years ago
Kai Wang (Victor Kai) 964a28678f [NFC] polish initializer_3d.py code style (#3279) 2 years ago
Arsmart1 8af977f223 [NFC] polish colossalai/context/parallel_context.py code style (#3276) 2 years ago
Zirui Zhu c9e3ee389e
[NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726) 2 years ago
Ziyue Jiang 4603538ddd
[NFC] posh colossalai/context/process_group_initializer/initializer_sequence.py code style (#2712) 2 years ago
アマデウス 534f68c83c
[NFC] polish pipeline process group code style (#2694) 2 years ago
LuGY 56ff1921e9
[NFC] polish colossalai/context/moe_context.py code style (#2693) 2 years ago
アマデウス 99d9713b02 Revert "Update parallel_context.py (#2408)" 2 years ago
Haofan Wang 7d5640b9db
Update parallel_context.py (#2408) 2 years ago
Tongping Liu 8e22c38b89
[hotfix] Fixing the bug related to ipv6 support 2 years ago
kurisusnowdeng 0b8161fab8 updated tp layers 2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705) 2 years ago
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640) 2 years ago
Frank Lee 27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy (#1591) 2 years ago
ver217 d068af81a3
[doc] update rst and docstring (#1351) 2 years ago
Frank Lee 2238758c2e
[usability] improved error messages in the context module (#856) 3 years ago
Frank Lee 920fe31526
[compatibility] used backward-compatible API for global process group (#758) 3 years ago
Frank Lee 04ff5ea546
[utils] support detection of number of processes on current node (#723) 3 years ago
Cautiousss 055d0270c8 [NFC] polish colossalai/context/process_group_initializer/initializer_sequence.py colossalai/context/process_group_initializer initializer_tensor.py code style (#639) 3 years ago
Jiang Zhuo 0a96338b13 [NFC] polish <colossalai/context/process_group_initializer/initializer_data.py> code stype (#626) 3 years ago
ziyu huang 701bad439b [NFC] polish colossalai/context/process_group_initializer/process_group_initializer.py code stype (#617) 3 years ago
アマデウス 297b8baae2
[model checkpoint] add gloo groups for cpu tensor communication (#589) 3 years ago
Liang Bowen 2c45efc398
html refactor (#555) 3 years ago
Liang Bowen ec5086c49c Refactored docstring to google style 3 years ago
Jiarui Fang a445e118cf
[polish] polish singleton and global context (#500) 3 years ago
HELSON f24b5ed201
[MOE] remove old MoE legacy (#493) 3 years ago
Jiarui Fang 65c0f380c2
[format] polish name format for MOE (#481) 3 years ago
HELSON 7544347145
[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469) 3 years ago
HELSON 84fd7c1d4d
add moe context, moe utilities and refactor gradient handler (#455) 3 years ago
Frank Lee b72b8445c6
optimized context test time consumption (#446) 3 years ago
Frank Lee 1e4bf85cdb fixed bug in activation checkpointing test (#387) 3 years ago
RichardoLuo 8539898ec6 flake8 style change (#363) 3 years ago
ziyu huang a77d73f22b fix format parallel_context.py (#359) 3 years ago
Maruyama_Aya e83970e3dc fix format ColossalAI\colossalai\context\process_group_initializer 3 years ago
アマデウス 9ee197d0e9 moved env variables to global variables; (#215) 3 years ago
HELSON 0f8c7f9804
Fixed docstring in colossalai (#171) 3 years ago
Frank Lee e2089c5c15
adapted for sequence parallel (#163) 3 years ago
HELSON dceae85195
Added MoE parallel (#127) 3 years ago
ver217 a951bc6089
update default logger (#100) (#101) 3 years ago
ver217 96780e6ee4
Optimize pipeline schedule (#94) 3 years ago
アマデウス 01a80cd86d
Hotfix/Colossalai layers (#92) 3 years ago
アマデウス 0fedef4f3c
Layer integration (#83) 3 years ago