XYE
e83b2ce853
[NFC] polish colossalai/nn/layer/vanilla/layers.py code style ( #1295 )
2022-07-13 12:08:21 +08:00
Liping233
1000a41fd5
[NFC] polish colossalai/nn/layer/vanilla/__init__.py code style ( #1293 )
2022-07-13 12:08:21 +08:00
Wangbo Zhao(黑色枷锁)
552667825b
[NFC] polish colossalai/nn/layer/parallel_1d/layers.py code style ( #1290 )
2022-07-13 12:08:21 +08:00
Jiatong Han
38e3ccd1e9
[NFC] polish colossalai/nn/layer/parallel_sequence/layers.py code style ( #1280 )
...
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-07-13 12:08:21 +08:00
Boyuan Yao
b414eaa5db
[NFC] polish colossalai/nn/optimizer/lamb.py code style ( #1275 )
2022-07-13 12:08:21 +08:00
Super Daniel
52d145a342
[NFC] polish colossalai/nn/lr_scheduler/onecycle.py code style ( #1269 )
2022-07-13 12:08:21 +08:00
Geng Zhang
0e06f62160
[NFC] polish colossalai/nn/layer/parallel_sequence/_operation.py code style ( #1266 )
2022-07-13 12:08:21 +08:00
superhao1995
f660152c73
[NFC] polish colossalai/nn/layer/parallel_3d/_operation.py code style ( #1258 )
...
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2022-07-13 12:08:21 +08:00
Thunderbeee
9738fb0f78
[NFC] polish colossalai/nn/lr_scheduler/__init__.py ( #1255 )
...
code style
2022-07-13 12:08:21 +08:00
Ofey Chan
2dd4d556fb
[NFC] polish colossalai/nn/init.py code style ( #1292 )
2022-07-13 10:51:55 +08:00
HELSON
abba4d84e1
[hotfix] fix bert model test in unitests ( #1272 )
2022-07-12 23:26:45 +08:00
oahzxl
0cf8e8e91c
[NFC] polish <colossalai/nn/lr_scheduler/poly.py> code style ( #1267 )
2022-07-12 18:18:14 +08:00
Jiarui Fang
1aad903c15
[tensor] redistribute among different process groups ( #1247 )
...
* make it faster
* [tensor] rename convert_to_dist -> redistribute
* [tensor] ShardSpec and ReplicaSpec
* [tensor] redistribute among diff pgs
* polish code
2022-07-12 10:24:05 +08:00
Jiarui Fang
9bcd2fd4af
[tensor] a shorter shard and replicate spec ( #1245 )
2022-07-11 15:51:48 +08:00
Jiarui Fang
2699dfbbfd
[rename] convert_to_dist -> redistribute ( #1243 )
2022-07-11 13:05:44 +08:00
Jiarui Fang
4a76084dc9
[tensor] add zero_like colo op, important for Optimizer ( #1236 )
2022-07-08 14:55:27 +08:00
Jiarui Fang
3b500984b1
[tensor] fix some unittests ( #1234 )
2022-07-08 14:18:30 +08:00
HELSON
0453776def
[tensor] fix a assertion in colo_tensor cross_entropy ( #1232 )
2022-07-08 11:18:00 +08:00
HELSON
42ab36b762
[tensor] add unitest for colo_tensor 1DTP cross_entropy ( #1230 )
2022-07-07 19:17:23 +08:00
Yi Zhao
04537bf83e
[checkpoint]support generalized scheduler ( #1222 )
2022-07-07 18:16:38 +08:00
Jiarui Fang
a98319f023
[tensor] torch function return colotensor ( #1229 )
2022-07-07 18:09:18 +08:00
Jiarui Fang
ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. ( #1203 )
2022-07-06 16:15:16 +08:00
Jiarui Fang
b5f25eb32a
[Tensor] add cpu group to ddp ( #1200 )
2022-07-05 14:58:28 +08:00
Jiarui Fang
060b917daf
[refactor] remove gpc dependency in colotensor's _ops ( #1189 )
2022-07-04 18:54:37 +08:00
Jiarui Fang
372f791444
[refactor] move chunk and chunkmgr to directory gemini ( #1182 )
2022-06-29 13:31:02 +08:00
ver217
6b2f2ab9bb
[ddp] ColoDDP uses bucket all-reduce ( #1177 )
...
* add reducer
* update colo ddp with reducer
* polish unit test
* polish unit test
2022-06-29 10:34:13 +08:00
Jiarui Fang
1b657f9ce1
[tensor] revert local view back ( #1178 )
2022-06-27 18:38:34 +08:00
Jiarui Fang
0dd4e2bbfb
[Tensor] rename some APIs in TensorSpec and Polish view unittest ( #1176 )
2022-06-27 15:56:11 +08:00
Ziyue Jiang
dd0420909f
[Tensor] rename parallel_action ( #1174 )
...
* rename parallel_action
* polish
2022-06-27 10:04:45 +08:00
Jiarui Fang
aa7bef73d4
[Tensor] distributed view supports inter-process hybrid parallel ( #1169 )
2022-06-27 09:45:26 +08:00
Jiarui Fang
4b9bba8116
[ColoTensor] rename APIs and add output_replicate to ComputeSpec ( #1168 )
2022-06-24 13:08:54 +08:00
Jiarui Fang
f4ef224358
[Tensor] remove ParallelAction, use ComputeSpec instread ( #1166 )
2022-06-23 17:34:59 +08:00
Jiarui Fang
177c374401
remove gather out in parallel action ( #1163 )
2022-06-23 16:35:05 +08:00
Ziyue Jiang
955ac912de
remove log ( #1160 )
2022-06-23 10:32:42 +08:00
Jiarui Fang
07f9c781f9
[graph] improve the graph building. ( #1157 )
2022-06-22 16:47:20 +08:00
ver217
22717a856f
[tensor] add embedding bag op ( #1156 )
2022-06-22 15:54:03 +08:00
ver217
ae86151968
[tensor] add more element-wise ops ( #1155 )
...
* add more element-wise ops
* update test_op
* polish unit test
2022-06-22 15:16:47 +08:00
ver217
54aabb8da4
[gemini] refactor gemini mgr ( #1151 )
...
* refactor gemini mgr
* udpate __init__
2022-06-22 11:54:36 +08:00
ver217
8106d7b8c7
[ddp] refactor ColoDDP and ZeroDDP ( #1146 )
...
* ColoDDP supports overwriting default process group
* rename ColoDDPV2 to ZeroDDP
* add docstr for ZeroDDP
* polish docstr
2022-06-21 16:35:23 +08:00
ver217
ccf3c58c89
embedding op use gather_out ( #1143 )
2022-06-21 13:21:20 +08:00
Frank Lee
15aab1476e
[zero] avoid zero hook spam by changing log to debug level ( #1137 )
2022-06-21 10:44:01 +08:00
ver217
e4f555f29a
[optim] refactor fused sgd ( #1134 )
2022-06-20 11:19:38 +08:00
ver217
d26902645e
[ddp] add save/load state dict for ColoDDP ( #1127 )
...
* add save/load state dict for ColoDDP
* add unit test
* refactor unit test folder
* polish unit test
* rename unit test
2022-06-20 10:51:47 +08:00
ver217
f0a954f16d
[ddp] add set_params_to_ignore for ColoDDP ( #1122 )
...
* add set_params_to_ignore for ColoDDP
* polish code
* fix zero hook v2
* add unit test
* polish docstr
2022-06-16 12:54:46 +08:00
ver217
e127b4375b
cast colo ddp v2 inputs/outputs ( #1120 )
2022-06-15 15:57:04 +08:00
ver217
7d14b473f0
[gemini] gemini mgr supports "cpu" placement policy ( #1118 )
...
* update gemini mgr
* update chunk
* add docstr
* polish placement policy
* update test chunk
* update test zero
* polish unit test
* remove useless unit test
2022-06-15 15:05:19 +08:00
ver217
895c1c5ee7
[tensor] refactor param op hook ( #1097 )
...
* refactor param op hook
* add docstr
* fix bug
2022-06-13 16:11:53 +08:00
Frank Lee
cb18922c47
[doc] added documentation to chunk and chunk manager ( #1094 )
...
* [doc] added documentation to chunk and chunk manager
* polish code
* polish code
* polish code
2022-06-10 15:33:06 +08:00
ver217
1f894e033f
[gemini] zero supports gemini ( #1093 )
...
* add placement policy
* add gemini mgr
* update mem stats collector
* update zero
* update zero optim
* fix bugs
* zero optim monitor os
* polish unit test
* polish unit test
* add assert
2022-06-10 14:48:28 +08:00
Frank Lee
2b2dc1c86b
[pipeline] refactor the pipeline module ( #1087 )
...
* [pipeline] refactor the pipeline module
* polish code
2022-06-10 11:27:38 +08:00