Commit Graph

878 Commits (0d87c4e20d4fdadde47549802b51f3b165c59eac)

Author SHA1 Message Date
Jiarui Fang e1f97fd2b8
[embedding] print profiling results (#1654) 2022-09-27 12:50:33 +08:00
Frank Lee 30e50c8b4a
[autoparallel] implemented all matmul strategy generator (#1650) 2022-09-27 12:06:25 +08:00
YuliangLiu0306 03978aad45
[autoparallel] change the following nodes strategies generation logic (#1636)
* [autoparallel] change the following nodes strategies generation logic

* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306 59f100510a
[autoparallel] where handler (#1651)
* [autoparallel] where handler

* fix unit test
2022-09-27 11:20:43 +08:00
Super Daniel 6135e178b3
[fx] refactor code for profiler / enable fake tensor movement. (#1646)
* [fx/profiling] provide summary for MetaInfoProp.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx] optimize table repr.

* [fx] optimize table repr.

* [fx] refactor code for profiler.

* [fx] add docstring.

* [fx] add docstring.

* [fx] skip test.

* [fx] redo.

* [fx] redo.

* [fx] fix import error for torch11.

* [fx] fix import error for torch11.
2022-09-27 10:26:52 +08:00
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
* [fx] fix offload codegen test

* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee 45b39a692a
[autoparallel] implemented linear projection strategy generator (#1639) 2022-09-26 16:58:14 +08:00
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
* [fix] fixed the collective pattern name for consistency

* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306 b2b2a4af98
[autoparallel] adapt solver with mlp (#1638) 2022-09-26 15:26:14 +08:00
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647) 2022-09-26 14:57:57 +08:00
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-26 11:18:59 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
Boyuan Yao f921733621
[autoparallel] Add pofo sequence annotation (#1637)
* [autoparallel] annotate pofo sequence

* [autoparallel] remove unused print

* [autoparallel] fix some code
2022-09-24 01:52:57 +08:00
Super Daniel 04bbabeea8
[fx/profiler] provide a table of summary. (#1634)
* [fx/profiling] provide summary for MetaInfoProp.

* [fx/profiler] provide a table of summary.

* [fx] optimize table repr.
2022-09-23 18:12:43 +08:00
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640) 2022-09-23 17:20:41 +08:00
Jiarui Fang e57df80325
[embeddings] cache option (#1635) 2022-09-23 16:40:18 +08:00
HELSON a088022efc
[moe] fix moe bugs (#1633) 2022-09-23 15:33:57 +08:00
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306 c7ac0f4ab2
[autoparallel] add elementwise handler (#1622)
* [autoparallel] add elementwise handler

* polish code

* polish code

* reduce skipped strategies range

* polish code
2022-09-23 13:27:31 +08:00
YuliangLiu0306 3a46215135
[autoparallel] add embedding handler (#1620) 2022-09-23 12:34:30 +08:00
YuliangLiu0306 69448f64c4
[autoparallel] protect bcast handler from invalid strategies (#1631) 2022-09-23 12:12:49 +08:00
YuliangLiu0306 0c703189b9
[autoparallel] add layernorm handler (#1629) 2022-09-23 12:00:25 +08:00
YuliangLiu0306 bf77d3ab65
[autoparallel] recover the merged node strategy index (#1613) 2022-09-23 11:52:42 +08:00
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
2022-09-23 11:04:52 +08:00
Super Daniel d967779a32
[fx/profiler] tuned the calculation of memory estimation (#1619)
* [fx] tuned the meta info and rotor solver.

* [fx] remove import.

* [fx] remove import.

* [fx] remove import.

* [fx] tune the meta calculations.

* [fx] polish comments.

* [fx] remove assertions.

* [fx] modify test cases.

* [fx] modify test cases.

* [fx] optimize import.

* [fx
2022-09-23 10:59:47 +08:00
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
* remove forced FP32 modules

* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625) 2022-09-22 11:16:25 +08:00
Frank Lee d925122020
[autoparallel] added new linear module handler (#1616) 2022-09-21 12:23:21 +08:00
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing
2022-09-20 18:00:39 +08:00
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611) 2022-09-20 14:33:04 +08:00
YuliangLiu0306 6a8f8cc05e
[hotfix] got sliced types (#1614) 2022-09-20 14:32:42 +08:00
Frank Lee d397842fa8
[autoparallel] added new node handler (#1612) 2022-09-20 14:17:21 +08:00
YuliangLiu0306 7d1bb71d5d
[fx] PoC of runtime shape consistency application (#1607)
* [fx] PoC of runtime shape consistency application

* polish code
2022-09-20 14:00:04 +08:00
YuliangLiu0306 47b11c432c
[autoparallel]add bcast matmul strategies (#1605) 2022-09-20 11:26:21 +08:00
Frank Lee edb67cb378
[autoparallel] refactored the data structure for sharding strategy (#1610) 2022-09-20 11:20:54 +08:00
Boyuan Yao 933b6c6367
[fx] Add pofo solver (#1608)
* [fx] add pofo algorithm

* [fx] Add pofo solver

* [fx] code refactor

* [fx] fix test_linearize import
2022-09-20 11:20:48 +08:00
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2022-09-19 11:44:18 +08:00
ver217 c9e8ce67b8
fix move fp32 shards (#1604) 2022-09-16 17:33:16 +08:00
YuliangLiu0306 eac1b79371
[autoparallel] add bcast op handler (#1600)
* [autoparallel] add bcast op handler

* polish code

* add more BCAST FUNC OP

* polish code

* add exception handler

* polish
2022-09-16 11:33:01 +08:00
Frank Lee 3abf98a633
[autoparallel] added all non-bcast matmul strategies (#1603) 2022-09-16 10:47:32 +08:00
Frank Lee db98b695b2
[autoparallel] added strategy generator and bmm strategies (#1602) 2022-09-15 16:57:07 +08:00
Jiarui Fang a19eb80998
[embedding] updates some default parameters 2022-09-15 15:45:17 +08:00
Super Daniel cd5cf2bcc9
[fx/tuning] tune performance on rotor with meta info. (#1599) 2022-09-15 14:46:36 +08:00
Boyuan Yao a7cda6f57d
[fx] Add offload codegen (#1598)
* [fx] add input activation offload to codegen

* [fx] modify unit test

* [fx] remove two skips in torch11

* [fx] use all_input_nodes instead of _input_nodes
2022-09-14 15:49:06 +08:00
Super Daniel c8e9b2ad78
[hotfix/rotor] fix variable names (#1597)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

* [fx] fix variable names in ckpt_solvers and unskip tests.

* [fx] commit my changes.

* [fx] restore skips.

* [fx] restore skips.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.
2022-09-14 14:27:04 +08:00
YuliangLiu0306 faa23b9d9a
[autoparallel] add reshape handler (#1594)
* [autoparallel] add reshape handler

* polish code
2022-09-14 10:25:45 +08:00
Super Daniel 5c494d4540
[fx] provide an accurate estimation of memory. (#1587)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.
2022-09-14 09:36:43 +08:00
Frank Lee 27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy (#1591)
* [autoparallel] refactored shape consistency to remove redundancy

* polish code

* polish code

* polish code
2022-09-13 18:30:18 +08:00
YuliangLiu0306 d164449d00
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) 2022-09-13 18:05:05 +08:00
Frank Lee 7c18a588c8
[autoparallel] added generate_sharding_spec to utils (#1590) 2022-09-13 15:43:22 +08:00
Boyuan Yao 49ccf8b5f8
[fx] Improve linearize and rotor solver (#1586)
* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix

* [fx] imporve linearize and rotor solver

* [fx] some comments and format modification
2022-09-13 14:50:04 +08:00
Frank Lee 219f66c571
[autoparallel] added solver option dataclass (#1588) 2022-09-13 14:47:09 +08:00
YuliangLiu0306 82d4376c23
[autoparallel] adapt solver with resnet (#1583)
* [autoparallel]adapt solver with resnet

* polish code

* polish code
2022-09-13 12:07:09 +08:00
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) 2022-09-13 10:50:34 +08:00
Boyuan Yao f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen (#1585)
* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix
2022-09-12 20:00:48 +08:00
Boyuan Yao 20e466527b [NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576) 2022-09-08 22:11:04 +08:00
Fazzie-Maqianli 06dccdde44 [NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style (#1554) 2022-09-08 22:11:04 +08:00
CsRic 2ac46f7be4 [NFC] polish utils/tensor_detector/__init__.py code style (#1573)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-08 22:11:04 +08:00
Sze-qq 2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) 2022-09-08 22:11:04 +08:00
superhao1995 e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571)
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2022-09-08 22:11:04 +08:00
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-09-08 22:11:04 +08:00
Zirui Zhu f566c9b98d [NFC] polish colossalai/pipeline/utils.py code style (#1562) 2022-09-08 22:11:04 +08:00
Xue Fuzhao e070ca45c6 [NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563) 2022-09-08 22:11:04 +08:00
Zangwei Zheng 9823cbf24b [NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565) 2022-09-08 22:11:04 +08:00
DouJS f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) 2022-09-08 22:11:04 +08:00
LuGY c7d4932956 [NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566) 2022-09-08 22:11:04 +08:00
BigOneLiXiaoMing 0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) 2022-09-08 22:11:04 +08:00
Ziheng Qin 08815f0e72 [NFC] polish colossalai/builder/__init__.py code style (#1560)
Co-authored-by: henryqin1997 <henryqin1997@gamil.com>
2022-09-08 22:11:04 +08:00
Super Daniel 8328917348 [NFC] polish colossalai/testing/comparison.py code style. (#1558) 2022-09-08 22:11:04 +08:00
Ofey Chan 7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) 2022-09-08 22:11:04 +08:00
Kai Wang (Victor Kai) 46931e3c32 [NFC] polish code colossalai/gemini/update/search_utils.py (#1557) 2022-09-08 22:11:04 +08:00
yuxuan-lou 413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) 2022-09-08 22:11:04 +08:00
shenggan 8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) 2022-09-08 22:11:04 +08:00
Maruyama_Aya bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552) 2022-09-08 22:11:04 +08:00
binmakeswell 73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style 2022-09-08 22:11:04 +08:00
Kirigaya Kazuto 318fbf1145
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559) 2022-09-08 22:04:34 +08:00
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564) 2022-09-08 16:41:19 +08:00
ver217 10dd8226b1
add gather_output for VocabParallelClassifier1D (#1569) 2022-09-08 16:40:56 +08:00
Kirigaya Kazuto 6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost (#1544) 2022-09-07 19:01:06 +08:00
Super Daniel 4f59693207
[fx] provide a stable but not accurate enough version of profiler. (#1547)
* [fx] compute memory stat and flop count for MetaInfoProp.

* [fx] modify node attribute.

* [fx] modify ckpt_chen.

* [fx] fix compatibility.

* [fx] fix import error.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip if torch 1.11.0.

* [fx] recover MetaInfoProp support for PyTorch 1.11.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix import error.
2022-09-07 11:21:04 +08:00
YuliangLiu0306 0908d0fc61
[autoparallel]add backward cost info into strategies (#1524) 2022-09-07 11:19:00 +08:00
YuliangLiu0306 1a3599410d
[autoparallel] support fucntion in operator handler (#1529) 2022-09-07 11:18:41 +08:00
YuliangLiu0306 44c866a3e3
[autoparallel] change the merge node logic (#1533) 2022-09-07 11:18:19 +08:00
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
* refactor parallel layer

* broadcast rank0 model after load ckpt
2022-09-06 20:18:35 +08:00
ver217 2bed096848
[utils] optimize partition_tensor_parallel_state_dict (#1546) 2022-09-06 17:45:31 +08:00
Super Daniel d8a5aded19
[hotfix] change namespace for meta_trace. (#1541) 2022-09-06 11:46:12 +08:00
ver217 a203b709d5
[hotfix] fix init context (#1543)
* fix init context

* fix lazy init ctx
2022-09-06 11:45:08 +08:00
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545) 2022-09-06 10:41:20 +08:00
Boyuan Yao 46c6cc79a9
[fx] Add common node in model linearize (#1542)
* [fx] Add common node into linearize

* [fx] Add common node to solver
2022-09-05 18:35:05 +08:00
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537) 2022-09-05 15:12:53 +08:00
Super Daniel 70129603aa
[fx] support meta tracing for aten level computation graphs like functorch. (#1536)
* [fx] support meta tracing for aten level computation graphs like functorch.

* [fx] support meta tracing for aten level computation graphs like functorch.

* [fx] remove redundant import.

* [fx] add docstring.
2022-09-05 12:10:09 +08:00
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538) 2022-09-02 15:48:35 +08:00
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535) 2022-09-02 11:09:37 +08:00
Boyuan Yao 56159049e8
[fx] Modify solver linearize and add corresponding test (#1531)
* [fx] modify solver linearize and add test

* [fx] add torch11 test of linearize but skip it

* [fx] remove some unused imports
2022-09-02 10:24:41 +08:00
YuliangLiu0306 4b3d6caeb3
[fx]patch nn.functional convolution (#1528) 2022-09-01 19:05:07 +08:00
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526) 2022-09-01 17:55:41 +08:00
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
* support p2p communication with any type of object | pass test

* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test

* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule

* [pipeline/rpc] implement a demo for PP with cuda rpc framework

* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] skip process group test

* [pipeline/pipleline_process_group] remove test named function
2022-09-01 17:45:47 +08:00
Super Daniel 112a1f0a8f
[hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530)
* [hotfix] avoid conflict of meta registry with torch 1.13.0.

* [hotfix] avoid conflict of meta registry with torch 1.13.0.
2022-09-01 15:31:21 +08:00
Boyuan Yao b231430bcb
[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521)
* [fx] fix wrong variable name in solver rotor

* [fx] fix wrong variable name in solver rotor

* [fx] fix the discretize bug

* [fx] fix the first op in activation checkpoint codegen

* [fx] fix some bugs of ckpt solver

* [fx] modify test_ckpt_torchvision

* [fx] set sequence to __sequence__ attr of GraphModule

* [fx] docstring modification

* [fx] remove performance test
2022-08-31 18:10:48 +08:00