Frank Lee
f9a613d660
[autoparallel] added binary elementwise node handler ( #1758 )
...
* [autoparallel] added binary elementwise node handler
* polish code
2 years ago
YuliangLiu0306
d2fc067231
[autoparallel] fix param hook issue in transform pass ( #1755 )
2 years ago
Frank Lee
262652c8bc
[autoparallel] added addbmm handler ( #1751 )
2 years ago
YuliangLiu0306
980ed21723
[autoparallel] shard param and buffer as expected ( #1753 )
...
* [autoparallel] shard param and buffer as expected
* fix unit test issue
2 years ago
YuliangLiu0306
cdb7d5e7d2
[hotfix] autoparallel unit test ( #1752 )
2 years ago
YuliangLiu0306
a4ce180e85
[autoparallel] add sequential order to communication actions ( #1735 )
2 years ago
Super Daniel
b893342f95
[fx] test tracer on diffuser modules. ( #1750 )
...
* [fx] test tracer on diffuser modules.
* [fx] shorter seq_len.
* Update requirements-test.txt
2 years ago
Frank Lee
b80b6eaa88
[autoparallel] recovered skipped test cases ( #1748 )
2 years ago
Frank Lee
474111ecb5
[autoparallel] fixed wrong sharding strategy in conv handler ( #1747 )
...
* [autoparallel] fixed wrong sharding strategy in conv handler
* polish code
2 years ago
Frank Lee
8b8937d901
[autoparallel] fixed wrong generated strategy for dot op ( #1746 )
...
* [autoparallel] fixed wrong generated strategy for dot op
* polish code
2 years ago
Frank Lee
88a79814fb
[autoparallel] handled illegal strategy in node handler ( #1743 )
...
* [autoparallel] handled illegal strategy in node handler
* polish code
2 years ago
Super Daniel
30874f1692
[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler ( #1730 )
...
* [fx/profiler] add test.
* [fx] fix file names.
* [fx] add docstring and comment.
* [fx] polish profiler.py.
* [fx] fix import errors.
* [fx] fix profiler.
* [fx] fix names.
2 years ago
Frank Lee
eee84908d4
[autoparallel] handled illegal sharding strategy ( #1728 )
...
* [autoparallel] handled illegal sharding strategy
* polish code
2 years ago
Ziheng Qin
cbe9a4cb45
[NFC] polish tests/test_layers/test_3d/test_3d.py code style ( #1740 )
2 years ago
lucasliunju
912eb58ea0
[NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style ( #1733 )
2 years ago
Xue Fuzhao
754aa7c81f
[NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style ( #1731 )
2 years ago
xyupeng
ff373a11eb
[NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style ( #1723 )
2 years ago
Kai Wang (Victor Kai)
b38efe4e8a
[NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style ( #1718 )
2 years ago
binmakeswell
f6389d0813
[NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style ( #1715 )
2 years ago
HELSON
f69f9bf223
[zero] add chunk init function for users ( #1729 )
...
* add chunk manager init function
* fix unit tests
* add comment
* add flush=True
2 years ago
Super Daniel
393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug ( #1710 )
...
* [fx] move meta registration
* [fx] fix tests.
* [fx] fix test.
* [fx] fix.
* [meta] refactor meta registration.py.
* [fx] add compatibility descriptions.
* [fx] polish import.
* [fx] add a decorator.
* [fx] fix tests.
* [fx] remove print.
* [fx] edit raise error.
* [fx] edit raise error.
* [fx] add type hint.
* [fx] fix import in experimental.
* [rpc] remove color debug.
* [meta] fix naming.
2 years ago
Frank Lee
e8d8eda5e7
[autoparallel] moved tests to test_tensor_shard ( #1713 )
2 years ago
YuliangLiu0306
845ff4a47a
[autoparallel] resnet block runtime apply ( #1709 )
...
* [autoparallel] resnet block runtime apply
* seperate buffer and parameter in MemoryCost
* polish code
* add comments and todos
* fix test issue
2 years ago
Frank Lee
22a115406b
[autoparallel] fixed broken node handler tests ( #1708 )
2 years ago
HELSON
1468e4bcfc
[zero] add constant placement policy ( #1705 )
...
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2 years ago
Frank Lee
6c331a5a09
[autoparallel] refactored the autoparallel module for organization ( #1706 )
...
* [autoparallel] refactored the autoparallel module for organization
* polish code
2 years ago
Frank Lee
91cd34e6e0
[unittest] added doc for the pytest wrapper ( #1704 )
2 years ago
YuliangLiu0306
451cd72dea
[autoparallel] adapt runtime passes ( #1703 )
...
* [autoparallel] adapt runtime passes v2
* polish code
2 years ago
Jiarui Fang
21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding ( #1699 )
2 years ago
Frank Lee
0e52f3d3d5
[unittest] supported condititonal testing based on env var ( #1701 )
...
polish code
2 years ago
Frank Lee
8283e95db3
[autoparallel] collated all deprecated files ( #1700 )
...
* [autoparallel] collated all deprecated files
* polish code
2 years ago
YuliangLiu0306
81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler ( #1695 )
...
* [autoparallel] adapt solver and CostGraph with new handler
* fix test issue
2 years ago
YuliangLiu0306
42b882ef06
[autoparallel] add output handler and placeholder handler ( #1694 )
...
* [autoparallel] add output handler and placeholder handler
* Delete test_solver_with_resnet.py
* fix test bugs
2 years ago
YuliangLiu0306
56088e6d98
[autoparallel] add pooling handler ( #1690 )
...
* [autoparallel] add pooling handler
* polish code
2 years ago
YuliangLiu0306
319d654f79
[autoparallel] where_handler_v2 ( #1688 )
...
* where generator
* [autoparallel] where_handler_v2
2 years ago
Boyuan Yao
31d2f03d27
[autoparallel] fix C version rotor inconsistency ( #1691 )
2 years ago
Frank Lee
4973157ad7
[autoparallel] added sharding spec conversion for linear handler ( #1687 )
2 years ago
YuliangLiu0306
af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug ( #1683 )
2 years ago
Super Daniel
3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 ( #1679 )
...
* [fx/profiler] modify data_ptr into uuid for all tensors.
* [fx] modify uuid.
* [fx/profiler] tune performance on GPT-2.
* [fx] updates.
* [fx] debug.
* [fx] debug.
* [fx] cuda.
2 years ago
YuliangLiu0306
517b63939a
[autoparallel] add unary element wise handler v2 ( #1674 )
2 years ago
YuliangLiu0306
f6c6a932b8
[autoparallel] add following node generator ( #1673 )
...
* [autoparallel] add following node generator
* polish code
* polish code
* update name of arguments
2 years ago
YuliangLiu0306
52fda88796
[autoparallel] add layer norm handler v2 ( #1671 )
...
* [autoparallel] add layer norm handler v2
* polish code
* polish code
2 years ago
HELSON
b28991dd0a
[feature] A new ZeRO implementation ( #1644 )
2 years ago
Boyuan Yao
1df98d5b66
[autoparallel] add rotor C version ( #1658 )
...
* [autoparallel] add rotor c version
* [fx] remove metainfoprop in rotor solver
* [autoparallel] modify C
code format
* [autoparallel] remove build.py
* [autoparallel] fix C extension build
* [autoparallel] add C solver consistency test
* [autoparallel] remove some unused imports
* [autoparallel] refactor rotor solver code
* [autoparallel] replace print with colossalai logger
* [autoparallel] ranks fixed
2 years ago
YuliangLiu0306
11ec070e53
[hotfix]unit test ( #1670 )
2 years ago
Frank Lee
a60024e77a
[autoparallel] added utils for broadcast operation ( #1665 )
...
* [autoparallel] added utils for broadcast operation
* polish code
2 years ago
YuliangLiu0306
3f068d1409
[autoparallel] update CommSpec ( #1667 )
2 years ago
YuliangLiu0306
746f8f979d
[autoparallel] add batch norm handler v2 ( #1666 )
2 years ago
Kirigaya Kazuto
9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward ( #1642 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
* [pipeline/pytree] add pytree to process args and kwargs | provide to process args and kwargs after forward
2 years ago
Frank Lee
3a4d6f63a8
[autoparallel] added node handler for bmm ( #1655 )
2 years ago
YuliangLiu0306
095854477f
[autoparallel] add conv handler v2 ( #1663 )
2 years ago
YuliangLiu0306
1e7816a460
[autoparallel] adapt solver with gpt ( #1653 )
2 years ago
Frank Lee
30e50c8b4a
[autoparallel] implemented all matmul strategy generator ( #1650 )
2 years ago
YuliangLiu0306
03978aad45
[autoparallel] change the following nodes strategies generation logic ( #1636 )
...
* [autoparallel] change the following nodes strategies generation logic
* fix unit test
2 years ago
YuliangLiu0306
59f100510a
[autoparallel] where handler ( #1651 )
...
* [autoparallel] where handler
* fix unit test
2 years ago
Boyuan Yao
5d0fdb9cb4
[fx] fix offload codegen test ( #1648 )
...
* [fx] fix offload codegen test
* [fx] modify typing
2 years ago
Frank Lee
45b39a692a
[autoparallel] implemented linear projection strategy generator ( #1639 )
2 years ago
Frank Lee
154d3ef432
[fix] fixed the collective pattern name for consistency ( #1649 )
...
* [fix] fixed the collective pattern name for consistency
* polish code
2 years ago
YuliangLiu0306
b2b2a4af98
[autoparallel] adapt solver with mlp ( #1638 )
2 years ago
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2 years ago
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2 years ago
HELSON
95c35f73bd
[moe] initialize MoE groups by ProcessGroup ( #1640 )
2 years ago
HELSON
a088022efc
[moe] fix moe bugs ( #1633 )
2 years ago
YuliangLiu0306
702dbc5288
[tensor] use communication autograd func ( #1617 )
...
* [tensor] use communication autograd func
* change all to all comm spec info
* rename pattern and distinguish fwd/bwd
* polish code
2 years ago
YuliangLiu0306
0c703189b9
[autoparallel] add layernorm handler ( #1629 )
2 years ago
YuliangLiu0306
bf77d3ab65
[autoparallel] recover the merged node strategy index ( #1613 )
2 years ago
Boyuan Yao
d6b01feb66
[fx] Modify offload codegen ( #1618 )
...
* [fx] modify offload codegen
* [fx] remove repeated hook definitions
* [fx] modify offload test
2 years ago
YuliangLiu0306
9eae855408
[hotfix] add recompile after graph manipulatation ( #1621 )
2 years ago
Super Daniel
d967779a32
[fx/profiler] tuned the calculation of memory estimation ( #1619 )
...
* [fx] tuned the meta info and rotor solver.
* [fx] remove import.
* [fx] remove import.
* [fx] remove import.
* [fx] tune the meta calculations.
* [fx] polish comments.
* [fx] remove assertions.
* [fx] modify test cases.
* [fx] modify test cases.
* [fx] optimize import.
* [fx
2 years ago
HELSON
f7f2248771
[moe] fix MoE bugs ( #1628 )
...
* remove forced FP32 modules
* correct no_shard-contexts' positions
2 years ago
Jiarui Fang
38c68b5b9a
[embedding] rollback for better FAW performance ( #1625 )
2 years ago
Frank Lee
d925122020
[autoparallel] added new linear module handler ( #1616 )
2 years ago
Kirigaya Kazuto
170fa81095
[pipeline/chimera] test chimera | fix bug of initializing ( #1615 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
* [pipeline/chimera] test chimera | fix bug of initializing
2 years ago
Jiarui Fang
504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num ( #1611 )
2 years ago
YuliangLiu0306
7d1bb71d5d
[fx] PoC of runtime shape consistency application ( #1607 )
...
* [fx] PoC of runtime shape consistency application
* polish code
2 years ago
YuliangLiu0306
47b11c432c
[autoparallel]add bcast matmul strategies ( #1605 )
2 years ago
Boyuan Yao
933b6c6367
[fx] Add pofo solver ( #1608 )
...
* [fx] add pofo algorithm
* [fx] Add pofo solver
* [fx] code refactor
* [fx] fix test_linearize import
2 years ago
Kirigaya Kazuto
edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera ( #1595 )
...
* [pipeline/tuning] improve dispatch performance both time and space cost
* [pipeline/converge] add interface for testing convergence
* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style
* Update PipelineBase.py
* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2 years ago
YuliangLiu0306
eac1b79371
[autoparallel] add bcast op handler ( #1600 )
...
* [autoparallel] add bcast op handler
* polish code
* add more BCAST FUNC OP
* polish code
* add exception handler
* polish
2 years ago
Boyuan Yao
a7cda6f57d
[fx] Add offload codegen ( #1598 )
...
* [fx] add input activation offload to codegen
* [fx] modify unit test
* [fx] remove two skips in torch11
* [fx] use all_input_nodes instead of _input_nodes
2 years ago
Super Daniel
c8e9b2ad78
[hotfix/rotor] fix variable names ( #1597 )
...
* [fx] add some comment and docstrings.
* [fx] add dataflow analysis for an autograd graph.
* add intepretation for graph analysis.
* [fx] before doing save_tensor_hooks.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] provide an accurate estimation of memory except for GPT-2.
* [fx] a very accurate version on GPT-2.
* [fx] refactor code.
* [fx] remove redundant inplace=True.
* [fx] refactor code.
* [fx] refactor code.
* [fx] refactor code.
* [fx] dive into backward memory.
* [fx] fix variable names in ckpt_solvers and unskip tests.
* [fx] commit my changes.
* [fx] restore skips.
* [fx] restore skips.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
* [fx] chaange stage into phase.
2 years ago
YuliangLiu0306
faa23b9d9a
[autoparallel] add reshape handler ( #1594 )
...
* [autoparallel] add reshape handler
* polish code
2 years ago
Frank Lee
27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy ( #1591 )
...
* [autoparallel] refactored shape consistency to remove redundancy
* polish code
* polish code
* polish code
2 years ago
YuliangLiu0306
d164449d00
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost ( #1589 )
2 years ago
Frank Lee
219f66c571
[autoparallel] added solver option dataclass ( #1588 )
2 years ago
YuliangLiu0306
82d4376c23
[autoparallel] adapt solver with resnet ( #1583 )
...
* [autoparallel]adapt solver with resnet
* polish code
* polish code
2 years ago
CsRic
f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode ( #1584 )
2 years ago
Boyuan Yao
f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen ( #1585 )
...
* [fx] add nested activation_checkpoint codegen
* undo algorithms commits
* solver
* undo some commits
* [fx] torch11 add nested activation checkpoint codegen
* remove some imports
* [fx] add some comments in activation codegen
* [fx] codegen instance error fix
2 years ago
アマデウス
e615cfc3a8
[NFC] polish test component gpt code style ( #1567 )
2 years ago
Kirigaya Kazuto
6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost ( #1544 )
2 years ago
Super Daniel
4f59693207
[fx] provide a stable but not accurate enough version of profiler. ( #1547 )
...
* [fx] compute memory stat and flop count for MetaInfoProp.
* [fx] modify node attribute.
* [fx] modify ckpt_chen.
* [fx] fix compatibility.
* [fx] fix import error.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip test for MetaInfoProp.
* [fx] skip if torch 1.11.0.
* [fx] recover MetaInfoProp support for PyTorch 1.11.
* [fx] provide a stable but not accurate enough version of profiler.
* [fx] provide a stable but not accurate enough version of profiler.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix compatibility in tests.
* [fx] fix import error.
2 years ago
YuliangLiu0306
0908d0fc61
[autoparallel]add backward cost info into strategies ( #1524 )
2 years ago
YuliangLiu0306
44c866a3e3
[autoparallel] change the merge node logic ( #1533 )
2 years ago
Jiarui Fang
64169f3e8f
[embedding] polish parallel embedding tablewise ( #1545 )
2 years ago
CsRic
964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application ( #1537 )
2 years ago
Boyuan Yao
56159049e8
[fx] Modify solver linearize and add corresponding test ( #1531 )
...
* [fx] modify solver linearize and add test
* [fx] add torch11 test of linearize but skip it
* [fx] remove some unused imports
2 years ago
Super Daniel
7dc53237c3
[fx] add test for meta tensor. ( #1527 )
...
* [fx] add test for meta tensor.
* [fx] add test for meta tensor.
* [fx] add test for meta tensor.
* [fx] add test for meta tensor.
* [fx] fix error.
2 years ago
YuliangLiu0306
4b3d6caeb3
[fx]patch nn.functional convolution ( #1528 )
2 years ago
CsRic
5156d5b4f8
[embedding] add tablewise sharding for FAW ( #1526 )
2 years ago
Kirigaya Kazuto
f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP ( #1508 )
...
* support p2p communication with any type of object | pass test
* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test
* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule
* [pipeline/rpc] implement a demo for PP with cuda rpc framework
* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B
* [pipeline/rpc] implement distributed optimizer | test with assert_close
* [pipeline/rpc] implement distributed optimizer | test with assert_close
* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy
* [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP
* [pipeline/pipleline_process_group] remove comment
* [pipeline/pipleline_process_group] remove comment
* [pipeline/pipleline_process_group] skip process group test
* [pipeline/pipleline_process_group] remove test named function
2 years ago