Commit Graph

965 Commits (7c7921f71bf93e739b1939c724a4cfe9cd405247)

Author SHA1 Message Date
Frank Lee e2355d01b9
[autoparallel] init new folder structure (#1696) 2022-10-13 14:18:55 +08:00
YuliangLiu0306 81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler (#1695)
* [autoparallel] adapt solver and CostGraph with new handler

* fix test issue
2022-10-13 14:04:15 +08:00
YuliangLiu0306 42b882ef06
[autoparallel] add output handler and placeholder handler (#1694)
* [autoparallel] add output handler and placeholder handler

* Delete test_solver_with_resnet.py

* fix test bugs
2022-10-13 13:42:36 +08:00
YuliangLiu0306 56088e6d98
[autoparallel] add pooling handler (#1690)
* [autoparallel] add pooling handler

* polish code
2022-10-13 13:42:13 +08:00
YuliangLiu0306 319d654f79
[autoparallel] where_handler_v2 (#1688)
* where generator

* [autoparallel] where_handler_v2
2022-10-13 11:02:22 +08:00
Boyuan Yao 31d2f03d27
[autoparallel] fix C version rotor inconsistency (#1691) 2022-10-12 15:21:58 +08:00
Jiarui Fang 363fc2861a
[embeddings] more detailed timer (#1692) 2022-10-12 12:01:21 +08:00
Frank Lee 4973157ad7
[autoparallel] added sharding spec conversion for linear handler (#1687) 2022-10-12 11:16:18 +08:00
YuliangLiu0306 af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) 2022-10-11 18:12:59 +08:00
YuliangLiu0306 6878e42248
[hotfix] solver bug caused by dict type comm cost (#1686) 2022-10-11 17:57:03 +08:00
Super Daniel 3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679)
* [fx/profiler] modify data_ptr into uuid for all tensors.

* [fx] modify uuid.

* [fx/profiler] tune performance on GPT-2.

* [fx] updates.

* [fx] debug.

* [fx] debug.

* [fx] cuda.
2022-10-11 11:03:35 +08:00
Kirigaya Kazuto 0df5034a36
[pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework (#1684)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward

* [pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework
2022-10-10 16:01:02 +08:00
jim e5ab6be72e
[hotfix[ fix colotensor.type() raise NotImplementedError (#1682) 2022-10-10 10:13:31 +08:00
Kirigaya Kazuto 3b2a59b0ba
[pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-10-09 17:32:57 +08:00
YuliangLiu0306 517b63939a
[autoparallel] add unary element wise handler v2 (#1674) 2022-10-09 17:30:42 +08:00
YuliangLiu0306 f6c6a932b8
[autoparallel] add following node generator (#1673)
* [autoparallel] add following node generator

* polish code

* polish code

* update name of arguments
2022-10-09 14:49:18 +08:00
YuliangLiu0306 52fda88796
[autoparallel] add layer norm handler v2 (#1671)
* [autoparallel] add layer norm handler v2

* polish code

* polish code
2022-10-09 14:23:22 +08:00
Fazzie-Maqianli 87c5ad352a
update version to 0.1.10 (#1676) 2022-10-09 10:43:29 +08:00
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Boyuan Yao b1be5b88bd
[autoparallel] fix insecure subprocess (#1680)
* [autoparallel] fix insecure subprocess

* [fx] fix insecure subprocess
2022-10-06 15:07:03 +08:00
Boyuan Yao d8420f81a4
[hotfix] fix wrong type name in profiler (#1678) 2022-10-05 21:59:05 +08:00
Boyuan Yao 132b4306b7
[fx] Add concrete info prop (#1677)
* [fx] concreteinfoprop

* [fx] add concreteinfoprop

* [fx] modify docstring of ConcreteInfoProp

* [fx] fix device error

* [fx] modify parameter calculation

* [fx] modify parameters calculation
2022-10-04 16:48:24 +08:00
Boyuan Yao 1df98d5b66
[autoparallel] add rotor C version (#1658)
* [autoparallel] add rotor c version

* [fx] remove metainfoprop in rotor solver

* [autoparallel] modify C
 code format

* [autoparallel] remove build.py

* [autoparallel] fix C extension build

* [autoparallel] add C solver consistency test

* [autoparallel] remove some unused imports

* [autoparallel] refactor rotor solver code

* [autoparallel] replace print with colossalai logger

* [autoparallel] ranks fixed
2022-10-03 17:13:30 +08:00
YuliangLiu0306 11ec070e53
[hotfix]unit test (#1670) 2022-09-29 12:49:28 +08:00
Frank Lee a60024e77a
[autoparallel] added utils for broadcast operation (#1665)
* [autoparallel] added utils for broadcast operation

* polish code
2022-09-29 11:22:29 +08:00
YuliangLiu0306 3f068d1409
[autoparallel] update CommSpec (#1667) 2022-09-29 11:20:59 +08:00
Frank Lee 247a9dbca9
[autoparallel] added bias comm spec to matmul strategy (#1664) 2022-09-29 11:08:05 +08:00
YuliangLiu0306 746f8f979d
[autoparallel] add batch norm handler v2 (#1666) 2022-09-29 11:02:49 +08:00
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-09-29 10:58:58 +08:00
YuliangLiu0306 c27e701cb2
[autoparallel] remove no strategy nodes (#1652)
* [autoparallel] remove no strategy nodes

* fix none object iteration issue
2022-09-29 10:43:25 +08:00
Frank Lee 50f16a2850
[autoparallel] added compute resharding costs for node handler (#1662) 2022-09-28 19:55:44 +08:00
Frank Lee 9ec401a722
[autoparallel] added new strategy constructor template (#1661)
* [autoparallel] added new strategy constructor template

* polish code
2022-09-28 14:01:36 +08:00
Frank Lee 3a4d6f63a8
[autoparallel] added node handler for bmm (#1655) 2022-09-28 11:32:16 +08:00
YuliangLiu0306 095854477f
[autoparallel] add conv handler v2 (#1663) 2022-09-28 11:24:59 +08:00
YuliangLiu0306 1e7816a460
[autoparallel] adapt solver with gpt (#1653) 2022-09-28 11:17:26 +08:00
Jiarui Fang c638bec028
[embedding] polish async copy (#1657) 2022-09-27 14:37:03 +08:00
Jiarui Fang 988570e4a6
[embedding] add more detail profiling (#1656) 2022-09-27 13:43:59 +08:00
Jiarui Fang e1f97fd2b8
[embedding] print profiling results (#1654) 2022-09-27 12:50:33 +08:00
Frank Lee 30e50c8b4a
[autoparallel] implemented all matmul strategy generator (#1650) 2022-09-27 12:06:25 +08:00
YuliangLiu0306 03978aad45
[autoparallel] change the following nodes strategies generation logic (#1636)
* [autoparallel] change the following nodes strategies generation logic

* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306 59f100510a
[autoparallel] where handler (#1651)
* [autoparallel] where handler

* fix unit test
2022-09-27 11:20:43 +08:00
Super Daniel 6135e178b3
[fx] refactor code for profiler / enable fake tensor movement. (#1646)
* [fx/profiling] provide summary for MetaInfoProp.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx/profiler] provide a table of summary.

* [fx] optimize table repr.

* [fx] optimize table repr.

* [fx] refactor code for profiler.

* [fx] add docstring.

* [fx] add docstring.

* [fx] skip test.

* [fx] redo.

* [fx] redo.

* [fx] fix import error for torch11.

* [fx] fix import error for torch11.
2022-09-27 10:26:52 +08:00
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
* [fx] fix offload codegen test

* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee 45b39a692a
[autoparallel] implemented linear projection strategy generator (#1639) 2022-09-26 16:58:14 +08:00
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
* [fix] fixed the collective pattern name for consistency

* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306 b2b2a4af98
[autoparallel] adapt solver with mlp (#1638) 2022-09-26 15:26:14 +08:00
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647) 2022-09-26 14:57:57 +08:00
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-26 11:18:59 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
Boyuan Yao f921733621
[autoparallel] Add pofo sequence annotation (#1637)
* [autoparallel] annotate pofo sequence

* [autoparallel] remove unused print

* [autoparallel] fix some code
2022-09-24 01:52:57 +08:00
Super Daniel 04bbabeea8
[fx/profiler] provide a table of summary. (#1634)
* [fx/profiling] provide summary for MetaInfoProp.

* [fx/profiler] provide a table of summary.

* [fx] optimize table repr.
2022-09-23 18:12:43 +08:00
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640) 2022-09-23 17:20:41 +08:00
Jiarui Fang e57df80325
[embeddings] cache option (#1635) 2022-09-23 16:40:18 +08:00
HELSON a088022efc
[moe] fix moe bugs (#1633) 2022-09-23 15:33:57 +08:00
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306 c7ac0f4ab2
[autoparallel] add elementwise handler (#1622)
* [autoparallel] add elementwise handler

* polish code

* polish code

* reduce skipped strategies range

* polish code
2022-09-23 13:27:31 +08:00
YuliangLiu0306 3a46215135
[autoparallel] add embedding handler (#1620) 2022-09-23 12:34:30 +08:00
YuliangLiu0306 69448f64c4
[autoparallel] protect bcast handler from invalid strategies (#1631) 2022-09-23 12:12:49 +08:00
YuliangLiu0306 0c703189b9
[autoparallel] add layernorm handler (#1629) 2022-09-23 12:00:25 +08:00
YuliangLiu0306 bf77d3ab65
[autoparallel] recover the merged node strategy index (#1613) 2022-09-23 11:52:42 +08:00
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
2022-09-23 11:04:52 +08:00
Super Daniel d967779a32
[fx/profiler] tuned the calculation of memory estimation (#1619)
* [fx] tuned the meta info and rotor solver.

* [fx] remove import.

* [fx] remove import.

* [fx] remove import.

* [fx] tune the meta calculations.

* [fx] polish comments.

* [fx] remove assertions.

* [fx] modify test cases.

* [fx] modify test cases.

* [fx] optimize import.

* [fx
2022-09-23 10:59:47 +08:00
HELSON f7f2248771
[moe] fix MoE bugs (#1628)
* remove forced FP32 modules

* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625) 2022-09-22 11:16:25 +08:00
Frank Lee d925122020
[autoparallel] added new linear module handler (#1616) 2022-09-21 12:23:21 +08:00
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing
2022-09-20 18:00:39 +08:00
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611) 2022-09-20 14:33:04 +08:00
YuliangLiu0306 6a8f8cc05e
[hotfix] got sliced types (#1614) 2022-09-20 14:32:42 +08:00
Frank Lee d397842fa8
[autoparallel] added new node handler (#1612) 2022-09-20 14:17:21 +08:00
YuliangLiu0306 7d1bb71d5d
[fx] PoC of runtime shape consistency application (#1607)
* [fx] PoC of runtime shape consistency application

* polish code
2022-09-20 14:00:04 +08:00
YuliangLiu0306 47b11c432c
[autoparallel]add bcast matmul strategies (#1605) 2022-09-20 11:26:21 +08:00
Frank Lee edb67cb378
[autoparallel] refactored the data structure for sharding strategy (#1610) 2022-09-20 11:20:54 +08:00
Boyuan Yao 933b6c6367
[fx] Add pofo solver (#1608)
* [fx] add pofo algorithm

* [fx] Add pofo solver

* [fx] code refactor

* [fx] fix test_linearize import
2022-09-20 11:20:48 +08:00
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera
2022-09-19 11:44:18 +08:00
ver217 c9e8ce67b8
fix move fp32 shards (#1604) 2022-09-16 17:33:16 +08:00
YuliangLiu0306 eac1b79371
[autoparallel] add bcast op handler (#1600)
* [autoparallel] add bcast op handler

* polish code

* add more BCAST FUNC OP

* polish code

* add exception handler

* polish
2022-09-16 11:33:01 +08:00
Frank Lee 3abf98a633
[autoparallel] added all non-bcast matmul strategies (#1603) 2022-09-16 10:47:32 +08:00
Frank Lee db98b695b2
[autoparallel] added strategy generator and bmm strategies (#1602) 2022-09-15 16:57:07 +08:00
Jiarui Fang a19eb80998
[embedding] updates some default parameters 2022-09-15 15:45:17 +08:00
Super Daniel cd5cf2bcc9
[fx/tuning] tune performance on rotor with meta info. (#1599) 2022-09-15 14:46:36 +08:00
Boyuan Yao a7cda6f57d
[fx] Add offload codegen (#1598)
* [fx] add input activation offload to codegen

* [fx] modify unit test

* [fx] remove two skips in torch11

* [fx] use all_input_nodes instead of _input_nodes
2022-09-14 15:49:06 +08:00
Super Daniel c8e9b2ad78
[hotfix/rotor] fix variable names (#1597)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.

* [fx] fix variable names in ckpt_solvers and unskip tests.

* [fx] commit my changes.

* [fx] restore skips.

* [fx] restore skips.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.

* [fx] chaange stage into phase.
2022-09-14 14:27:04 +08:00
YuliangLiu0306 faa23b9d9a
[autoparallel] add reshape handler (#1594)
* [autoparallel] add reshape handler

* polish code
2022-09-14 10:25:45 +08:00
Super Daniel 5c494d4540
[fx] provide an accurate estimation of memory. (#1587)
* [fx] add some comment and docstrings.

* [fx] add dataflow analysis for an autograd graph.

* add intepretation for graph analysis.

* [fx] before doing save_tensor_hooks.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] provide an accurate estimation of memory except for GPT-2.

* [fx] a very accurate version on GPT-2.

* [fx] refactor code.

* [fx] remove redundant inplace=True.

* [fx] refactor code.

* [fx] refactor code.

* [fx] refactor code.

* [fx] dive into backward memory.
2022-09-14 09:36:43 +08:00
Frank Lee 27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy (#1591)
* [autoparallel] refactored shape consistency to remove redundancy

* polish code

* polish code

* polish code
2022-09-13 18:30:18 +08:00
YuliangLiu0306 d164449d00
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) 2022-09-13 18:05:05 +08:00
Frank Lee 7c18a588c8
[autoparallel] added generate_sharding_spec to utils (#1590) 2022-09-13 15:43:22 +08:00
Boyuan Yao 49ccf8b5f8
[fx] Improve linearize and rotor solver (#1586)
* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix

* [fx] imporve linearize and rotor solver

* [fx] some comments and format modification
2022-09-13 14:50:04 +08:00
Frank Lee 219f66c571
[autoparallel] added solver option dataclass (#1588) 2022-09-13 14:47:09 +08:00
YuliangLiu0306 82d4376c23
[autoparallel] adapt solver with resnet (#1583)
* [autoparallel]adapt solver with resnet

* polish code

* polish code
2022-09-13 12:07:09 +08:00
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) 2022-09-13 10:50:34 +08:00
Boyuan Yao f3687e4ee2
[fx] Add nested checkpoint in activation checkpoint codegen (#1585)
* [fx] add nested activation_checkpoint codegen

* undo algorithms commits

* solver

* undo some commits

* [fx] torch11 add nested activation checkpoint codegen

* remove some imports

* [fx] add some comments in activation codegen

* [fx] codegen instance error fix
2022-09-12 20:00:48 +08:00
Boyuan Yao 20e466527b [NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576) 2022-09-08 22:11:04 +08:00
Fazzie-Maqianli 06dccdde44 [NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style (#1554) 2022-09-08 22:11:04 +08:00
CsRic 2ac46f7be4 [NFC] polish utils/tensor_detector/__init__.py code style (#1573)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-08 22:11:04 +08:00
Sze-qq 2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) 2022-09-08 22:11:04 +08:00
superhao1995 e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571)
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2022-09-08 22:11:04 +08:00
Jiatong Han 3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-09-08 22:11:04 +08:00
Zirui Zhu f566c9b98d [NFC] polish colossalai/pipeline/utils.py code style (#1562) 2022-09-08 22:11:04 +08:00
Xue Fuzhao e070ca45c6 [NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563) 2022-09-08 22:11:04 +08:00
Zangwei Zheng 9823cbf24b [NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565) 2022-09-08 22:11:04 +08:00
DouJS f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) 2022-09-08 22:11:04 +08:00
LuGY c7d4932956 [NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566) 2022-09-08 22:11:04 +08:00
BigOneLiXiaoMing 0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) 2022-09-08 22:11:04 +08:00
Ziheng Qin 08815f0e72 [NFC] polish colossalai/builder/__init__.py code style (#1560)
Co-authored-by: henryqin1997 <henryqin1997@gamil.com>
2022-09-08 22:11:04 +08:00
Super Daniel 8328917348 [NFC] polish colossalai/testing/comparison.py code style. (#1558) 2022-09-08 22:11:04 +08:00
Ofey Chan 7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) 2022-09-08 22:11:04 +08:00
Kai Wang (Victor Kai) 46931e3c32 [NFC] polish code colossalai/gemini/update/search_utils.py (#1557) 2022-09-08 22:11:04 +08:00
yuxuan-lou 413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) 2022-09-08 22:11:04 +08:00
shenggan 8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) 2022-09-08 22:11:04 +08:00
Maruyama_Aya bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552) 2022-09-08 22:11:04 +08:00
binmakeswell 73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style 2022-09-08 22:11:04 +08:00
Kirigaya Kazuto 318fbf1145
[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559) 2022-09-08 22:04:34 +08:00
CsRic a389ac4ec9
[embedding] cache_embedding small improvement (#1564) 2022-09-08 16:41:19 +08:00
ver217 10dd8226b1
add gather_output for VocabParallelClassifier1D (#1569) 2022-09-08 16:40:56 +08:00
Kirigaya Kazuto 6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost (#1544) 2022-09-07 19:01:06 +08:00
Super Daniel 4f59693207
[fx] provide a stable but not accurate enough version of profiler. (#1547)
* [fx] compute memory stat and flop count for MetaInfoProp.

* [fx] modify node attribute.

* [fx] modify ckpt_chen.

* [fx] fix compatibility.

* [fx] fix import error.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip test for MetaInfoProp.

* [fx] skip if torch 1.11.0.

* [fx] recover MetaInfoProp support for PyTorch 1.11.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] provide a stable but not accurate enough version of profiler.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix compatibility in tests.

* [fx] fix import error.
2022-09-07 11:21:04 +08:00
YuliangLiu0306 0908d0fc61
[autoparallel]add backward cost info into strategies (#1524) 2022-09-07 11:19:00 +08:00
YuliangLiu0306 1a3599410d
[autoparallel] support fucntion in operator handler (#1529) 2022-09-07 11:18:41 +08:00
YuliangLiu0306 44c866a3e3
[autoparallel] change the merge node logic (#1533) 2022-09-07 11:18:19 +08:00
ver217 ae71036cd2
[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
* refactor parallel layer

* broadcast rank0 model after load ckpt
2022-09-06 20:18:35 +08:00
ver217 2bed096848
[utils] optimize partition_tensor_parallel_state_dict (#1546) 2022-09-06 17:45:31 +08:00
Super Daniel d8a5aded19
[hotfix] change namespace for meta_trace. (#1541) 2022-09-06 11:46:12 +08:00
ver217 a203b709d5
[hotfix] fix init context (#1543)
* fix init context

* fix lazy init ctx
2022-09-06 11:45:08 +08:00
Jiarui Fang 64169f3e8f
[embedding] polish parallel embedding tablewise (#1545) 2022-09-06 10:41:20 +08:00
Boyuan Yao 46c6cc79a9
[fx] Add common node in model linearize (#1542)
* [fx] Add common node into linearize

* [fx] Add common node to solver
2022-09-05 18:35:05 +08:00
CsRic 964123ae0f
[embedding] freq_aware_embedding: add small functions for caller application (#1537) 2022-09-05 15:12:53 +08:00
Super Daniel 70129603aa
[fx] support meta tracing for aten level computation graphs like functorch. (#1536)
* [fx] support meta tracing for aten level computation graphs like functorch.

* [fx] support meta tracing for aten level computation graphs like functorch.

* [fx] remove redundant import.

* [fx] add docstring.
2022-09-05 12:10:09 +08:00
Jiarui Fang 521078ffc9
[embedding] fix a bug in table wise sharding (#1538) 2022-09-02 15:48:35 +08:00
Jiarui Fang 87134524fd
[embedding] tablewise sharding polish (#1535) 2022-09-02 11:09:37 +08:00
Boyuan Yao 56159049e8
[fx] Modify solver linearize and add corresponding test (#1531)
* [fx] modify solver linearize and add test

* [fx] add torch11 test of linearize but skip it

* [fx] remove some unused imports
2022-09-02 10:24:41 +08:00
YuliangLiu0306 4b3d6caeb3
[fx]patch nn.functional convolution (#1528) 2022-09-01 19:05:07 +08:00
CsRic 5156d5b4f8
[embedding] add tablewise sharding for FAW (#1526) 2022-09-01 17:55:41 +08:00
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
* support p2p communication with any type of object | pass test

* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test

* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule

* [pipeline/rpc] implement a demo for PP with cuda rpc framework

* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] skip process group test

* [pipeline/pipleline_process_group] remove test named function
2022-09-01 17:45:47 +08:00
Super Daniel 112a1f0a8f
[hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530)
* [hotfix] avoid conflict of meta registry with torch 1.13.0.

* [hotfix] avoid conflict of meta registry with torch 1.13.0.
2022-09-01 15:31:21 +08:00
Boyuan Yao b231430bcb
[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521)
* [fx] fix wrong variable name in solver rotor

* [fx] fix wrong variable name in solver rotor

* [fx] fix the discretize bug

* [fx] fix the first op in activation checkpoint codegen

* [fx] fix some bugs of ckpt solver

* [fx] modify test_ckpt_torchvision

* [fx] set sequence to __sequence__ attr of GraphModule

* [fx] docstring modification

* [fx] remove performance test
2022-08-31 18:10:48 +08:00
Super Daniel 5cc849f6ce
[fx] hack __torch_dispatch__ for meta tensor and autograd. (#1515)
* [fx] hack __torch_dispatch__ for meta tensor and autograd.

* [fx] hack __torch_dispatch__ for meta tensor and autograd.

* [fx] hack __torch_dispatch__ for meta tensor and autograd.

* [fx] hack __torch_dispatch__ for meta tensor and autograd.

* [fx] hack __torch_dispatch__ for meta tensor and autograd.

* [fx] add bad case detections.

* [fx] add bad case detections.

* [fx] rename MetaTensor attributes.

* [fx] fix unexpected error.

* [fx] fix unexpected error.

* [fx] fix unexpected error.

* [fx] fix unexpected error.

* [fx] fix unexpected error.

* [fx] add register backward for native_batch_norm_backward.

* [fx] add more meta backend support for nn.Modules.

* [fx] add meta backend to support timm and torchvision models.

* [fx] add meta hardswish for timm models.
2022-08-31 16:30:16 +08:00
Jiarui Fang 4537d39df9
[doc] docstring for FreqAwareEmbeddingBag (#1525) 2022-08-31 13:52:30 +08:00
YuliangLiu0306 3345c6d352
[autoparellel]add strategies constructor (#1505)
* [autoparellel]add strategies constructor

* remove duplicated strategies

* polish code

* adapt cost graph with StrategiesConstructor

* polish
2022-08-30 16:32:09 +08:00
Frank Lee a0436a62ee
[autoparallel] added liveness analysis (#1516)
* [autoparallel] added liveness analysis

* remove memory cost
2022-08-30 15:54:37 +08:00
Jiarui Fang 9a9ef65313
[FAW] cpu caching operations (#1520) 2022-08-30 14:50:02 +08:00
Super Daniel ea1a95b8b9
[hotfix] fix coloproxy typos. (#1519) 2022-08-30 11:39:03 +08:00
Jiarui Fang af5438caa2
[FAW] refactor reorder() for CachedParamMgr (#1514) 2022-08-29 14:22:07 +08:00
Jiarui Fang 9feee6d06b
[FAW] LFU initialize with dataset freq (#1513) 2022-08-29 12:52:53 +08:00
CsRic 1b8fee8e9c
[FAW] shrink freq_cnter size (#1509) 2022-08-29 11:44:55 +08:00
Boyuan Yao 4acc58ee20
[fx] Fix activation codegen dealing with checkpointing first op (#1510) 2022-08-27 19:39:21 +08:00
Boyuan Yao ac3a453a50
[fx] fix the discretize bug (#1506)
* [fx] fix wrong variable name in solver rotor

* [fx] fix wrong variable name in solver rotor

* code modification

* [fx] fix the discretize bug
2022-08-26 17:15:52 +08:00
Boyuan Yao 31fffd3fc5
[fx] fix wrong variable name in solver rotor (#1502)
* [fx] fix wrong variable name in solver rotor

* [fx] fix wrong variable name in solver rotor

* code modification
2022-08-26 15:47:08 +08:00
Jiarui Fang ba61109b6c
[FAW] remove code related to chunk (#1501) 2022-08-26 14:23:30 +08:00