Frank Lee
a60024e77a
[autoparallel] added utils for broadcast operation ( #1665 )
...
* [autoparallel] added utils for broadcast operation
* polish code
2022-09-29 11:22:29 +08:00
YuliangLiu0306
3f068d1409
[autoparallel] update CommSpec ( #1667 )
2022-09-29 11:20:59 +08:00
Frank Lee
247a9dbca9
[autoparallel] added bias comm spec to matmul strategy ( #1664 )
2022-09-29 11:08:05 +08:00
YuliangLiu0306
746f8f979d
[autoparallel] add batch norm handler v2 ( #1666 )
2022-09-29 11:02:49 +08:00
Kirigaya Kazuto
9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward ( #1642 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
* [pipeline/pytree] add pytree to process args and kwargs | provide to process args and kwargs after forward
2022-09-29 10:58:58 +08:00
YuliangLiu0306
c27e701cb2
[autoparallel] remove no strategy nodes ( #1652 )
...
* [autoparallel] remove no strategy nodes
* fix none object iteration issue
2022-09-29 10:43:25 +08:00
Frank Lee
50f16a2850
[autoparallel] added compute resharding costs for node handler ( #1662 )
2022-09-28 19:55:44 +08:00
Frank Lee
9ec401a722
[autoparallel] added new strategy constructor template ( #1661 )
...
* [autoparallel] added new strategy constructor template
* polish code
2022-09-28 14:01:36 +08:00
Frank Lee
3a4d6f63a8
[autoparallel] added node handler for bmm ( #1655 )
2022-09-28 11:32:16 +08:00
YuliangLiu0306
095854477f
[autoparallel] add conv handler v2 ( #1663 )
2022-09-28 11:24:59 +08:00
YuliangLiu0306
1e7816a460
[autoparallel] adapt solver with gpt ( #1653 )
2022-09-28 11:17:26 +08:00
Jiarui Fang
c638bec028
[embedding] polish async copy ( #1657 )
2022-09-27 14:37:03 +08:00
Jiarui Fang
988570e4a6
[embedding] add more detail profiling ( #1656 )
2022-09-27 13:43:59 +08:00
Jiarui Fang
e1f97fd2b8
[embedding] print profiling results ( #1654 )
2022-09-27 12:50:33 +08:00
Frank Lee
30e50c8b4a
[autoparallel] implemented all matmul strategy generator ( #1650 )
2022-09-27 12:06:25 +08:00
YuliangLiu0306
03978aad45
[autoparallel] change the following nodes strategies generation logic ( #1636 )
...
* [autoparallel] change the following nodes strategies generation logic
* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306
59f100510a
[autoparallel] where handler ( #1651 )
...
* [autoparallel] where handler
* fix unit test
2022-09-27 11:20:43 +08:00
Super Daniel
6135e178b3
[fx] refactor code for profiler / enable fake tensor movement. ( #1646 )
...
* [fx/profiling] provide summary for MetaInfoProp.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx] optimize table repr.
* [fx] optimize table repr.
* [fx] refactor code for profiler.
* [fx] add docstring.
* [fx] add docstring.
* [fx] skip test.
* [fx] redo.
* [fx] redo.
* [fx] fix import error for torch11.
* [fx] fix import error for torch11.
2022-09-27 10:26:52 +08:00
Boyuan Yao
5d0fdb9cb4
[fx] fix offload codegen test ( #1648 )
...
* [fx] fix offload codegen test
* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee
45b39a692a
[autoparallel] implemented linear projection strategy generator ( #1639 )
2022-09-26 16:58:14 +08:00
Frank Lee
154d3ef432
[fix] fixed the collective pattern name for consistency ( #1649 )
...
* [fix] fixed the collective pattern name for consistency
* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306
b2b2a4af98
[autoparallel] adapt solver with mlp ( #1638 )
2022-09-26 15:26:14 +08:00
Jiarui Fang
04443605a5
[embedding] non-blocking cpu-gpu copy ( #1647 )
2022-09-26 14:57:57 +08:00
CsRic
0767f67a0f
[embedding] isolate cache_op from forward ( #1645 )
...
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-26 11:18:59 +08:00
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2022-09-26 10:06:03 +08:00
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2022-09-24 19:58:18 +08:00
Boyuan Yao
f921733621
[autoparallel] Add pofo sequence annotation ( #1637 )
...
* [autoparallel] annotate pofo sequence
* [autoparallel] remove unused print
* [autoparallel] fix some code
2022-09-24 01:52:57 +08:00
Super Daniel
04bbabeea8
[fx/profiler] provide a table of summary. ( #1634 )
...
* [fx/profiling] provide summary for MetaInfoProp.
* [fx/profiler] provide a table of summary.
* [fx] optimize table repr.
2022-09-23 18:12:43 +08:00
HELSON
95c35f73bd
[moe] initialize MoE groups by ProcessGroup ( #1640 )
2022-09-23 17:20:41 +08:00
Jiarui Fang
e57df80325
[embeddings] cache option ( #1635 )
2022-09-23 16:40:18 +08:00
HELSON
a088022efc
[moe] fix moe bugs ( #1633 )
2022-09-23 15:33:57 +08:00
YuliangLiu0306
702dbc5288
[tensor] use communication autograd func ( #1617 )
...
* [tensor] use communication autograd func
* change all to all comm spec info
* rename pattern and distinguish fwd/bwd
* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306
c7ac0f4ab2
[autoparallel] add elementwise handler ( #1622 )
...
* [autoparallel] add elementwise handler
* polish code
* polish code
* reduce skipped strategies range
* polish code
2022-09-23 13:27:31 +08:00
YuliangLiu0306
3a46215135
[autoparallel] add embedding handler ( #1620 )
2022-09-23 12:34:30 +08:00
YuliangLiu0306
69448f64c4
[autoparallel] protect bcast handler from invalid strategies ( #1631 )
2022-09-23 12:12:49 +08:00
YuliangLiu0306
0c703189b9
[autoparallel] add layernorm handler ( #1629 )
2022-09-23 12:00:25 +08:00
YuliangLiu0306
bf77d3ab65
[autoparallel] recover the merged node strategy index ( #1613 )
2022-09-23 11:52:42 +08:00
Boyuan Yao
d6b01feb66
[fx] Modify offload codegen ( #1618 )
...
* [fx] modify offload codegen
* [fx] remove repeated hook definitions
* [fx] modify offload test
2022-09-23 11:04:52 +08:00
Super Daniel
d967779a32
[fx/profiler] tuned the calculation of memory estimation ( #1619 )
...
* [fx] tuned the meta info and rotor solver.
* [fx] remove import.
* [fx] remove import.
* [fx] remove import.
* [fx] tune the meta calculations.
* [fx] polish comments.
* [fx] remove assertions.
* [fx] modify test cases.
* [fx] modify test cases.
* [fx] optimize import.
* [fx
2022-09-23 10:59:47 +08:00
HELSON
f7f2248771
[moe] fix MoE bugs ( #1628 )
...
* remove forced FP32 modules
* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang
38c68b5b9a
[embedding] rollback for better FAW performance ( #1625 )
2022-09-22 11:16:25 +08:00
Frank Lee
d925122020
[autoparallel] added new linear module handler ( #1616 )
2022-09-21 12:23:21 +08:00
Kirigaya Kazuto
170fa81095
[pipeline/chimera] test chimera | fix bug of initializing ( #1615 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
2022-09-20 18:00:39 +08:00
Jiarui Fang
504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num ( #1611 )
2022-09-20 14:33:04 +08:00
YuliangLiu0306
6a8f8cc05e
[hotfix] got sliced types ( #1614 )
2022-09-20 14:32:42 +08:00
Frank Lee
d397842fa8
[autoparallel] added new node handler ( #1612 )
2022-09-20 14:17:21 +08:00
YuliangLiu0306
7d1bb71d5d
[fx] PoC of runtime shape consistency application ( #1607 )
...
* [fx] PoC of runtime shape consistency application
* polish code
2022-09-20 14:00:04 +08:00
YuliangLiu0306
47b11c432c
[autoparallel]add bcast matmul strategies ( #1605 )
2022-09-20 11:26:21 +08:00
Frank Lee
edb67cb378
[autoparallel] refactored the data structure for sharding strategy ( #1610 )
2022-09-20 11:20:54 +08:00
Boyuan Yao
933b6c6367
[fx] Add pofo solver ( #1608 )
...
* [fx] add pofo algorithm
* [fx] Add pofo solver
* [fx] code refactor
* [fx] fix test_linearize import
2022-09-20 11:20:48 +08:00
Kirigaya Kazuto
edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera ( #1595 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2022-09-19 11:44:18 +08:00
ver217
c9e8ce67b8
fix move fp32 shards ( #1604 )
2022-09-16 17:33:16 +08:00
YuliangLiu0306
eac1b79371
[autoparallel] add bcast op handler ( #1600 )
...
* [autoparallel] add bcast op handler
* polish code
* add more BCAST FUNC OP
* polish code
* add exception handler
* polish
2022-09-16 11:33:01 +08:00
Frank Lee
3abf98a633
[autoparallel] added all non-bcast matmul strategies ( #1603 )
2022-09-16 10:47:32 +08:00
Frank Lee
db98b695b2
[autoparallel] added strategy generator and bmm strategies ( #1602 )
2022-09-15 16:57:07 +08:00
Jiarui Fang
a19eb80998
[embedding] updates some default parameters
2022-09-15 15:45:17 +08:00
Super Daniel
cd5cf2bcc9
[fx/tuning] tune performance on rotor with meta info. ( #1599 )
2022-09-15 14:46:36 +08:00
Boyuan Yao
a7cda6f57d
[fx] Add offload codegen ( #1598 )
...
* [fx] add input activation offload to codegen
* [fx] modify unit test
* [fx] remove two skips in torch11
* [fx] use all_input_nodes instead of _input_nodes
2022-09-14 15:49:06 +08:00
Super Daniel
c8e9b2ad78
[hotfix/rotor] fix variable names ( #1597 )
...
* [fx] add some comment and docstrings.
* [fx] add dataflow analysis for an autograd graph.
* add intepretation for graph analysis.
* [fx] before doing save_tensor_hooks.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] a very accurate version on GPT-2.
* [fx] refactor code.
* [fx] remove redundant inplace=True.
* [fx] refactor code.
* [fx] refactor code.
* [fx] refactor code.
* [fx] dive into backward memory.
* [fx] fix variable names in ckpt_solvers and unskip tests.
* [fx] commit my changes.
* [fx] restore skips.
* [fx] restore skips.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
2022-09-14 14:27:04 +08:00
YuliangLiu0306
faa23b9d9a
[autoparallel] add reshape handler ( #1594 )
...
* [autoparallel] add reshape handler
* polish code
2022-09-14 10:25:45 +08:00
Super Daniel
5c494d4540
[fx] provide an accurate estimation of memory. ( #1587 )
...
* [fx] add some comment and docstrings.
* [fx] add dataflow analysis for an autograd graph.
* add intepretation for graph analysis.
* [fx] before doing save_tensor_hooks.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] a very accurate version on GPT-2.
* [fx] refactor code.
* [fx] remove redundant inplace=True.
* [fx] refactor code.
* [fx] refactor code.
* [fx] refactor code.
* [fx] dive into backward memory.
2022-09-14 09:36:43 +08:00
Frank Lee
27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy ( #1591 )
...
* [autoparallel] refactored shape consistency to remove redundancy
* polish code
* polish code
* polish code
2022-09-13 18:30:18 +08:00
YuliangLiu0306
d164449d00
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost ( #1589 )
2022-09-13 18:05:05 +08:00
Frank Lee
7c18a588c8
[autoparallel] added generate_sharding_spec to utils ( #1590 )
2022-09-13 15:43:22 +08:00
Boyuan Yao
49ccf8b5f8
[fx] Improve linearize and rotor solver ( #1586 )
...
* [fx] add nested activation_checkpoint codegen
* undo algorithms commits
* solver
* undo some commits
* [fx] torch11 add nested activation checkpoint codegen
* remove some imports
* [fx] add some comments in activation codegen
* [fx] codegen instance error fix
* [fx] imporve linearize and rotor solver
* [fx] some comments and format modification
2022-09-13 14:50:04 +08:00
Frank Lee
219f66c571
[autoparallel] added solver option dataclass ( #1588 )
2022-09-13 14:47:09 +08:00
YuliangLiu0306
82d4376c23
[autoparallel] adapt solver with resnet ( #1583 )
...
* [autoparallel]adapt solver with resnet
* polish code
* polish code
2022-09-13 12:07:09 +08:00
CsRic
f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode ( #1584 )
2022-09-13 10:50:34 +08:00
Boyuan Yao
f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen ( #1585 )
...
* [fx] add nested activation_checkpoint codegen
* undo algorithms commits
* solver
* undo some commits
* [fx] torch11 add nested activation checkpoint codegen
* remove some imports
* [fx] add some comments in activation codegen
* [fx] codegen instance error fix
2022-09-12 20:00:48 +08:00
Boyuan Yao
20e466527b
[NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style ( #1576 )
2022-09-08 22:11:04 +08:00
Fazzie-Maqianli
06dccdde44
[NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style ( #1554 )
2022-09-08 22:11:04 +08:00
CsRic
2ac46f7be4
[NFC] polish utils/tensor_detector/__init__.py code style ( #1573 )
...
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-08 22:11:04 +08:00
Sze-qq
2144cbae8c
[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style ( #1572 )
2022-09-08 22:11:04 +08:00
superhao1995
e4bf7ae667
[NFC] polish colossalai/nn/lr_scheduler/torch.py code style ( #1571 )
...
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2022-09-08 22:11:04 +08:00
Jiatong Han
3263cdf57f
[NFC] polish colossalai/nn/parallel/data_parallel.py code style ( #1570 )
...
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-09-08 22:11:04 +08:00
Zirui Zhu
f566c9b98d
[NFC] polish colossalai/pipeline/utils.py code style ( #1562 )
2022-09-08 22:11:04 +08:00
Xue Fuzhao
e070ca45c6
[NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style ( #1563 )
2022-09-08 22:11:04 +08:00
Zangwei Zheng
9823cbf24b
[NFC] polish colossalai/gemini/update/chunkv2.py code style ( #1565 )
2022-09-08 22:11:04 +08:00
DouJS
f586887a90
[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style ( #1568 )
2022-09-08 22:11:04 +08:00
LuGY
c7d4932956
[NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style ( #1566 )
2022-09-08 22:11:04 +08:00
BigOneLiXiaoMing
0c4c9aa6e0
[NFC] polish colossalai/nn/_ops/embedding.py code style ( #1561 )
2022-09-08 22:11:04 +08:00
Ziheng Qin
08815f0e72
[NFC] polish colossalai/builder/__init__.py code style ( #1560 )
...
Co-authored-by: henryqin1997 <henryqin1997@gamil.com>
2022-09-08 22:11:04 +08:00
Super Daniel
8328917348
[NFC] polish colossalai/testing/comparison.py code style. ( #1558 )
2022-09-08 22:11:04 +08:00
Ofey Chan
7cc052f6c0
[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py ( #1556 )
2022-09-08 22:11:04 +08:00
Kai Wang (Victor Kai)
46931e3c32
[NFC] polish code colossalai/gemini/update/search_utils.py ( #1557 )
2022-09-08 22:11:04 +08:00
yuxuan-lou
413f9c19f4
[NFC] polish colossalai/nn/_ops/layernorm.py code style ( #1555 )
2022-09-08 22:11:04 +08:00
shenggan
8edb777cc2
[NFC] polish colossalai/nn/loss/loss_2p5d.py code style ( #1553 )
2022-09-08 22:11:04 +08:00
Maruyama_Aya
bd2d789832
[NFC] polish colossalai/nn/_ops/embedding_bag.py code style ( #1552 )
2022-09-08 22:11:04 +08:00
binmakeswell
73e9eb13b7
[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style
2022-09-08 22:11:04 +08:00
Kirigaya Kazuto
318fbf1145
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style ( #1559 )
2022-09-08 22:04:34 +08:00
CsRic
a389ac4ec9
[embedding] cache_embedding small improvement ( #1564 )
2022-09-08 16:41:19 +08:00
ver217
10dd8226b1
add gather_output for VocabParallelClassifier1D ( #1569 )
2022-09-08 16:40:56 +08:00
Kirigaya Kazuto
6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost ( #1544 )
2022-09-07 19:01:06 +08:00
Super Daniel
4f59693207
[fx] provide a stable but not accurate enough version of profiler. ( #1547 )
...
* [fx] compute memory stat and flop count for MetaInfoProp.
* [fx] modify node attribute.
* [fx] modify ckpt_chen.
* [fx] fix compatibility.
* [fx] fix import error.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip if torch 1.11.0.
* [fx] recover MetaInfoProp support for PyTorch 1.11.
* [fx] provide a stable but not accurate enough version of profiler.
* [fx] provide a stable but not accurate enough version of profiler.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix import error.
2022-09-07 11:21:04 +08:00
YuliangLiu0306
0908d0fc61
[autoparallel]add backward cost info into strategies ( #1524 )
2022-09-07 11:19:00 +08:00
YuliangLiu0306
1a3599410d
[autoparallel] support fucntion in operator handler ( #1529 )
2022-09-07 11:18:41 +08:00
YuliangLiu0306
44c866a3e3
[autoparallel] change the merge node logic ( #1533 )
2022-09-07 11:18:19 +08:00
ver217
ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint ( #1548 )
...
* refactor parallel layer
* broadcast rank0 model after load ckpt
2022-09-06 20:18:35 +08:00
ver217
2bed096848
[utils] optimize partition_tensor_parallel_state_dict ( #1546 )
2022-09-06 17:45:31 +08:00
Super Daniel
d8a5aded19
[hotfix] change namespace for meta_trace. ( #1541 )
2022-09-06 11:46:12 +08:00