Kirigaya Kazuto
9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward ( #1642 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
* [pipeline/pytree] add pytree to process args and kwargs | provide to process args and kwargs after forward
2 years ago
YuliangLiu0306
c27e701cb2
[autoparallel] remove no strategy nodes ( #1652 )
...
* [autoparallel] remove no strategy nodes
* fix none object iteration issue
2 years ago
Frank Lee
50f16a2850
[autoparallel] added compute resharding costs for node handler ( #1662 )
2 years ago
Frank Lee
9ec401a722
[autoparallel] added new strategy constructor template ( #1661 )
...
* [autoparallel] added new strategy constructor template
* polish code
2 years ago
Frank Lee
3a4d6f63a8
[autoparallel] added node handler for bmm ( #1655 )
2 years ago
YuliangLiu0306
095854477f
[autoparallel] add conv handler v2 ( #1663 )
2 years ago
YuliangLiu0306
1e7816a460
[autoparallel] adapt solver with gpt ( #1653 )
2 years ago
Jiarui Fang
c638bec028
[embedding] polish async copy ( #1657 )
2 years ago
Jiarui Fang
988570e4a6
[embedding] add more detail profiling ( #1656 )
2 years ago
Jiarui Fang
e1f97fd2b8
[embedding] print profiling results ( #1654 )
2 years ago
Frank Lee
30e50c8b4a
[autoparallel] implemented all matmul strategy generator ( #1650 )
2 years ago
YuliangLiu0306
03978aad45
[autoparallel] change the following nodes strategies generation logic ( #1636 )
...
* [autoparallel] change the following nodes strategies generation logic
* fix unit test
2 years ago
YuliangLiu0306
59f100510a
[autoparallel] where handler ( #1651 )
...
* [autoparallel] where handler
* fix unit test
2 years ago
Super Daniel
6135e178b3
[fx] refactor code for profiler / enable fake tensor movement. ( #1646 )
...
* [fx/profiling] provide summary for MetaInfoProp.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx/profiler] provide a table of summary.
* [fx] optimize table repr.
* [fx] optimize table repr.
* [fx] refactor code for profiler.
* [fx] add docstring.
* [fx] add docstring.
* [fx] skip test.
* [fx] redo.
* [fx] redo.
* [fx] fix import error for torch11.
* [fx] fix import error for torch11.
2 years ago
Boyuan Yao
5d0fdb9cb4
[fx] fix offload codegen test ( #1648 )
...
* [fx] fix offload codegen test
* [fx] modify typing
2 years ago
Frank Lee
45b39a692a
[autoparallel] implemented linear projection strategy generator ( #1639 )
2 years ago
Frank Lee
154d3ef432
[fix] fixed the collective pattern name for consistency ( #1649 )
...
* [fix] fixed the collective pattern name for consistency
* polish code
2 years ago
YuliangLiu0306
b2b2a4af98
[autoparallel] adapt solver with mlp ( #1638 )
2 years ago
Jiarui Fang
04443605a5
[embedding] non-blocking cpu-gpu copy ( #1647 )
2 years ago
CsRic
0767f67a0f
[embedding] isolate cache_op from forward ( #1645 )
...
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2 years ago
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2 years ago
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2 years ago
Boyuan Yao
f921733621
[autoparallel] Add pofo sequence annotation ( #1637 )
...
* [autoparallel] annotate pofo sequence
* [autoparallel] remove unused print
* [autoparallel] fix some code
2 years ago
Super Daniel
04bbabeea8
[fx/profiler] provide a table of summary. ( #1634 )
...
* [fx/profiling] provide summary for MetaInfoProp.
* [fx/profiler] provide a table of summary.
* [fx] optimize table repr.
2 years ago
HELSON
95c35f73bd
[moe] initialize MoE groups by ProcessGroup ( #1640 )
2 years ago
Jiarui Fang
e57df80325
[embeddings] cache option ( #1635 )
2 years ago
HELSON
a088022efc
[moe] fix moe bugs ( #1633 )
2 years ago
YuliangLiu0306
702dbc5288
[tensor] use communication autograd func ( #1617 )
...
* [tensor] use communication autograd func
* change all to all comm spec info
* rename pattern and distinguish fwd/bwd
* polish code
2 years ago
YuliangLiu0306
c7ac0f4ab2
[autoparallel] add elementwise handler ( #1622 )
...
* [autoparallel] add elementwise handler
* polish code
* polish code
* reduce skipped strategies range
* polish code
2 years ago
YuliangLiu0306
3a46215135
[autoparallel] add embedding handler ( #1620 )
2 years ago
YuliangLiu0306
69448f64c4
[autoparallel] protect bcast handler from invalid strategies ( #1631 )
2 years ago
YuliangLiu0306
0c703189b9
[autoparallel] add layernorm handler ( #1629 )
2 years ago
YuliangLiu0306
bf77d3ab65
[autoparallel] recover the merged node strategy index ( #1613 )
2 years ago
Boyuan Yao
d6b01feb66
[fx] Modify offload codegen ( #1618 )
...
* [fx] modify offload codegen
* [fx] remove repeated hook definitions
* [fx] modify offload test
2 years ago
YuliangLiu0306
9eae855408
[hotfix] add recompile after graph manipulatation ( #1621 )
2 years ago
Super Daniel
d967779a32
[fx/profiler] tuned the calculation of memory estimation ( #1619 )
...
* [fx] tuned the meta info and rotor solver.
* [fx] remove import.
* [fx] remove import.
* [fx] remove import.
* [fx] tune the meta calculations.
* [fx] polish comments.
* [fx] remove assertions.
* [fx] modify test cases.
* [fx] modify test cases.
* [fx] optimize import.
* [fx
2 years ago
HELSON
f7f2248771
[moe] fix MoE bugs ( #1628 )
...
* remove forced FP32 modules
* correct no_shard-contexts' positions
2 years ago
Jiarui Fang
38c68b5b9a
[embedding] rollback for better FAW performance ( #1625 )
2 years ago
Frank Lee
d925122020
[autoparallel] added new linear module handler ( #1616 )
2 years ago
Kirigaya Kazuto
170fa81095
[pipeline/chimera] test chimera | fix bug of initializing ( #1615 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
2 years ago
Jiarui Fang
504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num ( #1611 )
2 years ago
YuliangLiu0306
6a8f8cc05e
[hotfix] got sliced types ( #1614 )
2 years ago
Frank Lee
d397842fa8
[autoparallel] added new node handler ( #1612 )
2 years ago
YuliangLiu0306
7d1bb71d5d
[fx] PoC of runtime shape consistency application ( #1607 )
...
* [fx] PoC of runtime shape consistency application
* polish code
2 years ago
YuliangLiu0306
47b11c432c
[autoparallel]add bcast matmul strategies ( #1605 )
2 years ago
Frank Lee
edb67cb378
[autoparallel] refactored the data structure for sharding strategy ( #1610 )
2 years ago
Boyuan Yao
933b6c6367
[fx] Add pofo solver ( #1608 )
...
* [fx] add pofo algorithm
* [fx] Add pofo solver
* [fx] code refactor
* [fx] fix test_linearize import
2 years ago
github-actions[bot]
d32cf84c46
Automated submodule synchronization ( #1609 )
...
Co-authored-by: github-actions <github-actions@github.com>
2 years ago
Frank Lee
725666d6a9
[workflow] deactivate conda environment before removing ( #1606 )
2 years ago
Kirigaya Kazuto
edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera ( #1595 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2 years ago