Commit Graph

483 Commits (2b49ca80a30aad4decc3180c92cfbab118fd4a6e)

Author SHA1 Message Date
binmakeswell f6389d0813 [NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715) 2022-10-19 12:20:51 +08:00
HELSON f69f9bf223
[zero] add chunk init function for users (#1729)
* add chunk manager init function

* fix unit tests

* add comment

* add flush=True
2022-10-18 16:31:22 +08:00
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
* [fx] move meta registration

* [fx] fix tests.

* [fx] fix test.

* [fx] fix.

* [meta] refactor meta registration.py.

* [fx] add compatibility descriptions.

* [fx] polish import.

* [fx] add a decorator.

* [fx] fix tests.

* [fx] remove print.

* [fx] edit raise error.

* [fx] edit raise error.

* [fx] add type hint.

* [fx] fix import in experimental.

* [rpc] remove color debug.

* [meta] fix naming.
2022-10-18 10:44:23 +08:00
Frank Lee e8d8eda5e7
[autoparallel] moved tests to test_tensor_shard (#1713) 2022-10-17 13:54:20 +08:00
YuliangLiu0306 845ff4a47a
[autoparallel] resnet block runtime apply (#1709)
* [autoparallel] resnet block runtime apply

* seperate buffer and parameter in MemoryCost

* polish code

* add comments and todos

* fix test issue
2022-10-17 13:37:38 +08:00
Frank Lee 22a115406b
[autoparallel] fixed broken node handler tests (#1708) 2022-10-14 18:25:59 +08:00
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
Frank Lee 6c331a5a09
[autoparallel] refactored the autoparallel module for organization (#1706)
* [autoparallel] refactored the autoparallel module for organization

* polish code
2022-10-14 13:27:00 +08:00
Frank Lee 91cd34e6e0
[unittest] added doc for the pytest wrapper (#1704) 2022-10-14 10:56:17 +08:00
YuliangLiu0306 451cd72dea
[autoparallel] adapt runtime passes (#1703)
* [autoparallel] adapt runtime passes v2

* polish code
2022-10-14 10:14:07 +08:00
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) 2022-10-13 22:22:27 +08:00
Frank Lee 0e52f3d3d5
[unittest] supported condititonal testing based on env var (#1701)
polish code
2022-10-13 19:38:45 +08:00
Frank Lee 8283e95db3
[autoparallel] collated all deprecated files (#1700)
* [autoparallel] collated all deprecated files

* polish code
2022-10-13 18:24:11 +08:00
YuliangLiu0306 81f7530ee7
[autoparallel] adapt solver and CostGraph with new handler (#1695)
* [autoparallel] adapt solver and CostGraph with new handler

* fix test issue
2022-10-13 14:04:15 +08:00
YuliangLiu0306 42b882ef06
[autoparallel] add output handler and placeholder handler (#1694)
* [autoparallel] add output handler and placeholder handler

* Delete test_solver_with_resnet.py

* fix test bugs
2022-10-13 13:42:36 +08:00
YuliangLiu0306 56088e6d98
[autoparallel] add pooling handler (#1690)
* [autoparallel] add pooling handler

* polish code
2022-10-13 13:42:13 +08:00
YuliangLiu0306 319d654f79
[autoparallel] where_handler_v2 (#1688)
* where generator

* [autoparallel] where_handler_v2
2022-10-13 11:02:22 +08:00
Boyuan Yao 31d2f03d27
[autoparallel] fix C version rotor inconsistency (#1691) 2022-10-12 15:21:58 +08:00
Frank Lee 4973157ad7
[autoparallel] added sharding spec conversion for linear handler (#1687) 2022-10-12 11:16:18 +08:00
YuliangLiu0306 af718e83f2
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) 2022-10-11 18:12:59 +08:00
Super Daniel 3dd6994427
[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679)
* [fx/profiler] modify data_ptr into uuid for all tensors.

* [fx] modify uuid.

* [fx/profiler] tune performance on GPT-2.

* [fx] updates.

* [fx] debug.

* [fx] debug.

* [fx] cuda.
2022-10-11 11:03:35 +08:00
YuliangLiu0306 517b63939a
[autoparallel] add unary element wise handler v2 (#1674) 2022-10-09 17:30:42 +08:00
YuliangLiu0306 f6c6a932b8
[autoparallel] add following node generator (#1673)
* [autoparallel] add following node generator

* polish code

* polish code

* update name of arguments
2022-10-09 14:49:18 +08:00
YuliangLiu0306 52fda88796
[autoparallel] add layer norm handler v2 (#1671)
* [autoparallel] add layer norm handler v2

* polish code

* polish code
2022-10-09 14:23:22 +08:00
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Boyuan Yao 1df98d5b66
[autoparallel] add rotor C version (#1658)
* [autoparallel] add rotor c version

* [fx] remove metainfoprop in rotor solver

* [autoparallel] modify C
 code format

* [autoparallel] remove build.py

* [autoparallel] fix C extension build

* [autoparallel] add C solver consistency test

* [autoparallel] remove some unused imports

* [autoparallel] refactor rotor solver code

* [autoparallel] replace print with colossalai logger

* [autoparallel] ranks fixed
2022-10-03 17:13:30 +08:00
YuliangLiu0306 11ec070e53
[hotfix]unit test (#1670) 2022-09-29 12:49:28 +08:00
Frank Lee a60024e77a
[autoparallel] added utils for broadcast operation (#1665)
* [autoparallel] added utils for broadcast operation

* polish code
2022-09-29 11:22:29 +08:00
YuliangLiu0306 3f068d1409
[autoparallel] update CommSpec (#1667) 2022-09-29 11:20:59 +08:00
YuliangLiu0306 746f8f979d
[autoparallel] add batch norm handler v2 (#1666) 2022-09-29 11:02:49 +08:00
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
* [pipeline/tuning] improve dispatch performance both time and space cost

* [pipeline/converge] add interface for testing convergence

* [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style

* Update PipelineBase.py

* [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera

* [pipeline/chimera] test chimera | fix bug of initializing

* [pipeline/pytree] add pytree to process args and kwargs | provide  to process args and kwargs after forward
2022-09-29 10:58:58 +08:00
Frank Lee 3a4d6f63a8
[autoparallel] added node handler for bmm (#1655) 2022-09-28 11:32:16 +08:00
YuliangLiu0306 095854477f
[autoparallel] add conv handler v2 (#1663) 2022-09-28 11:24:59 +08:00
YuliangLiu0306 1e7816a460
[autoparallel] adapt solver with gpt (#1653) 2022-09-28 11:17:26 +08:00
Frank Lee 30e50c8b4a
[autoparallel] implemented all matmul strategy generator (#1650) 2022-09-27 12:06:25 +08:00
YuliangLiu0306 03978aad45
[autoparallel] change the following nodes strategies generation logic (#1636)
* [autoparallel] change the following nodes strategies generation logic

* fix unit test
2022-09-27 11:20:52 +08:00
YuliangLiu0306 59f100510a
[autoparallel] where handler (#1651)
* [autoparallel] where handler

* fix unit test
2022-09-27 11:20:43 +08:00
Boyuan Yao 5d0fdb9cb4
[fx] fix offload codegen test (#1648)
* [fx] fix offload codegen test

* [fx] modify typing
2022-09-27 10:25:27 +08:00
Frank Lee 45b39a692a
[autoparallel] implemented linear projection strategy generator (#1639) 2022-09-26 16:58:14 +08:00
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
* [fix] fixed the collective pattern name for consistency

* polish code
2022-09-26 16:39:37 +08:00
YuliangLiu0306 b2b2a4af98
[autoparallel] adapt solver with mlp (#1638) 2022-09-26 15:26:14 +08:00
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON 5be118f405
[feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
HELSON 95c35f73bd
[moe] initialize MoE groups by ProcessGroup (#1640) 2022-09-23 17:20:41 +08:00
HELSON a088022efc
[moe] fix moe bugs (#1633) 2022-09-23 15:33:57 +08:00
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
* [tensor] use communication autograd func

* change all to all comm spec info

* rename pattern and distinguish fwd/bwd

* polish code
2022-09-23 13:31:15 +08:00
YuliangLiu0306 0c703189b9
[autoparallel] add layernorm handler (#1629) 2022-09-23 12:00:25 +08:00
YuliangLiu0306 bf77d3ab65
[autoparallel] recover the merged node strategy index (#1613) 2022-09-23 11:52:42 +08:00
Boyuan Yao d6b01feb66
[fx] Modify offload codegen (#1618)
* [fx] modify offload codegen

* [fx] remove repeated hook definitions

* [fx] modify offload test
2022-09-23 11:04:52 +08:00
YuliangLiu0306 9eae855408
[hotfix] add recompile after graph manipulatation (#1621) 2022-09-23 11:00:33 +08:00