Commit Graph

104 Commits (50e5602c2d6c8e25ad544cbecc38649e5257e7b8)

Author SHA1 Message Date
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671)
* [legacy] move communication to legacy (#4640)

* [legacy] refactor logger and clean up legacy codes (#4654)

* [legacy] make logger independent to gpc

* [legacy] make optim independent to registry

* [legacy] move test engine to legacy

* [legacy] move nn to legacy (#4656)

* [legacy] move nn to legacy

* [checkpointio] fix save hf config

* [test] remove useledd rpc pp test

* [legacy] fix nn init

* [example] skip tutorial hybriad parallel example

* [devops] test doc check

* [devops] test doc check
2023-09-11 16:24:28 +08:00
Hongxin Liu ac178ca5c1 [legacy] move builder and registry to legacy (#4603) 2023-09-05 21:53:10 +08:00
YeAnbang 3883db452c [NFC] polish unary_elementwise_generator.py code style (#4267)
Co-authored-by: aye42 <aye42@gatech.edu>
2023-07-26 14:12:57 +08:00
Frank Lee c4b1b65931 [test] fixed tests failed due to dtensor change (#4082)
* [test] fixed tests failed due to dtensor change

* polish code
2023-07-04 16:05:01 +08:00
digger yu e2d81eba0d
[nfc] fix typo colossalai/ applications/ (#3831)
* fix typo colossalai/autochunk auto_parallel amp

* fix typo colossalai/auto_parallel nn utils etc.

* fix typo colossalai/auto_parallel autochunk fx/passes  etc.

* fix typo docs/

* change placememt_policy to placement_policy in docs/ and examples/

* fix typo colossalai/ applications/
2023-05-25 16:19:41 +08:00
digger yu 7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808) 2023-05-24 09:01:50 +08:00
digger yu 9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779)
* fix typo colossalai/autochunk auto_parallel amp

* fix typo colossalai/auto_parallel nn utils etc.
2023-05-23 15:28:20 +08:00
digger yu 1baeb39c72
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742)
* fix typo applications/ and colossalai/ date 5.11

* fix typo colossalai/
2023-05-17 11:13:23 +08:00
YuliangLiu0306 ffcdbf0f65
[autoparallel]integrate auto parallel feature with new tracer (#3408)
* [autoparallel] integrate new analyzer in module level

* unify the profiling method

* polish

* fix no codegen bug

* fix pass bug

* fix liveness test

* polish
2023-04-04 17:40:45 +08:00
YuliangLiu0306 fee2af8610
[autoparallel] adapt autoparallel with new analyzer (#3261)
* [autoparallel] adapt autoparallel with new analyzer

* fix all node handler tests

* polish

* polish
2023-03-30 17:47:24 +08:00
YuliangLiu0306 47fb214b3b
[hotfix] add shard dim to aviod backward communication error (#2954) 2023-03-01 11:41:53 +08:00
YuliangLiu0306 197d0bf4ed
[autoparallel] apply repeat block to reduce solving time (#2912) 2023-02-28 11:03:30 +08:00
YuliangLiu0306 819e25d8b1
[hotfix] fix autoparallel compatibility test issues (#2754) 2023-02-23 17:28:36 +08:00
YuliangLiu0306 0f392d7403
[autoparallel] find repeat blocks (#2854)
* [autoparallel] find repeat blocks

* polish

* polish

* polish
2023-02-23 17:28:19 +08:00
YuliangLiu0306 2059fdd6b0
[hotfix] add copyright for solver and device mesh (#2803)
* [hotfix] add copyright for solver and device mesh

* add readme

* add alpa license

* polish
2023-02-18 21:14:38 +08:00
YuliangLiu0306 1dc003c169
[autoparallel] distinguish different parallel strategies (#2699) 2023-02-15 22:28:28 +08:00
YuliangLiu0306 21d6a48f4d
[autoparallel] add shard option (#2696)
* [autoparallel] add shard option

* polish
2023-02-15 13:48:28 +08:00
YuliangLiu0306 0b2a738393
[autoparallel] remove deprecated codes (#2664) 2023-02-15 09:54:32 +08:00
Boyuan Yao 0385b26ebf
[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647)
* [autoparallel] layernorm metainfo patch

* [autoparallel] polish test
2023-02-10 14:29:24 +08:00
YuliangLiu0306 37df666f38
[autoparallel] refactor handlers which reshape input tensors (#2615)
* [autoparallel] refactor handlers which reshape input tensors

* polish
2023-02-08 15:02:49 +08:00
YuliangLiu0306 28398f1c70
add overlap option (#2613) 2023-02-08 15:02:31 +08:00
YuliangLiu0306 cb3d1bef62
[autoparallel] adapt autoparallel tests with latest api (#2626) 2023-02-08 15:02:12 +08:00
Boyuan Yao 90a9fdd91d
[autoparallel] Patch meta information of `torch.matmul` (#2584)
* [autoparallel] matmul metainfo

* [auto_parallel] remove unused print

* [tests] skip test_matmul_handler when torch version is lower than 1.12.0
2023-02-08 11:05:31 +08:00
YuliangLiu0306 aa0f6686f9
[autoparallel] accelerate gpt2 training (#2495) 2023-01-29 11:13:15 +08:00
YuliangLiu0306 67e1912b59
[autoparallel] support origin activation ckpt on autoprallel system (#2468) 2023-01-16 16:25:13 +08:00
YuliangLiu0306 8221fd7485
[autoparallel] update binary elementwise handler (#2451)
* [autoparallel] update binary elementwise handler

* polish
2023-01-12 09:35:10 +08:00
YuliangLiu0306 2731531bc2
[autoparallel] integrate device mesh initialization into autoparallelize (#2393)
* [autoparallel] integrate device mesh initialization into autoparallelize

* add megatron solution

* update gpt autoparallel examples with latest api

* adapt beta value to fit the current computation cost
2023-01-11 14:03:49 +08:00
YuliangLiu0306 41429b9b28
[autoparallel] add shard option (#2423) 2023-01-11 13:40:33 +08:00
binmakeswell a881d6d000
Revert "[NFC] polish code format" (#2372) 2023-01-06 16:01:09 +08:00
Jiarui Fang 0dcc410f57
[NFC] polish code format 2023-01-06 15:54:06 +08:00
binmakeswell d634eae05b
Revert "[NFC] polish code format (#2367)" (#2371)
This reverts commit 1f8ab6f1f5.
2023-01-06 15:52:16 +08:00
Shawn-Kong d42aecdda1
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/embedding_handler.py code style (#2368) 2023-01-06 15:47:10 +08:00
binmakeswell 1f8ab6f1f5
[NFC] polish code format (#2367) 2023-01-06 15:34:48 +08:00
ExtremeViscent ac0d30fe2e
[NFC] polish batch_norm_handler.py code style (#2359) 2023-01-06 13:41:38 +08:00
ziyuhuang123 7080a8edb0
[workflow]New version: Create workflow files for examples' auto check (#2298)
* [workflows]bug_repair

* [workflow]new_pr_fixing_bugs

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2023-01-06 09:26:49 +08:00
LuGY e11a005c02
[NFC] polish colossalai/auto_parallel/tensor_shard/utils/factory.py code style (#2349) 2023-01-05 21:17:42 +08:00
yuxuan-lou 28e2d16794
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2340) 2023-01-05 16:53:24 +08:00
YuliangLiu0306 9c9246c0d9
[device] alpha beta profiler (#2311)
* [device] alpha beta profiler

* add usage

* fix variable name
2023-01-05 16:39:55 +08:00
Maruyama_Aya bd12a49e2a
[NFC] polish <colossalai/auto_parallel/tensor_shard/deprecated/constants.py> code style (#2339) 2023-01-05 16:20:54 +08:00
Zihao 35427bcab4
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/unary_elementwise_handler.py code style (#2326) 2023-01-05 12:18:08 +08:00
Ofey Chan 87d2defda6 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/layer_norm_handler.py code style (#2305) 2023-01-04 15:09:57 +08:00
Zangwei Zheng d1e5bafcd4 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/__init__.py code style (#2291) 2023-01-04 15:09:57 +08:00
shenggan 950685873f [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/reshape_handler.py code style (#2292) 2023-01-04 15:09:57 +08:00
Zirui Zhu 1c29b173c9 [NFC] polish colossalai/auto_parallel/tensor_shard/node_handler/getitem_handler.py code style (#2289) 2023-01-04 15:09:57 +08:00
Boyuan Yao d45695d94e
Merge pull request #2258 from hpcaitech/debug/ckpt-autoparallel
[autockpt] provide option for activation checkpoint search in SPMD solver
2023-01-04 11:37:28 +08:00
Boyuan Yao b904748210
[autoparallel] bypass MetaInfo when unavailable and modify BCAST_FUNC_OP metainfo (#2293)
* [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline

* [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop

* [autoparallel] specifycomm nodes' memory cost in construct chain

* [autoparallel] fix wrong runtime apply calculation

* [autoparallel] fix wrong runtime apply calculation

* [autoparallel] fix wrong runtime apply calculation

* [autoparallel] bypass metainfo when available and modify BCAST_FUNC_OP
2023-01-03 20:28:01 +08:00
YuliangLiu0306 fb87322773
[autoparallel] fix spelling error (#2270) 2023-01-03 16:13:00 +08:00
YuliangLiu0306 4b29112ab2
[autoparallel] gpt2 autoparallel examples (#2267)
* [autoparallel] gpt2 autoparallel examples

* polish code

* polish code
2023-01-03 14:23:33 +08:00
Super Daniel 3ccf58aa76
[autockpt] make it work. (#2257) 2023-01-02 23:37:45 +08:00
Boyuan Yao ab38aebace
[autoparallel] Hook all meta information on ResNet nodes for auto activation checkpoint (#2248)
* [autoparallel] hook node meta on graph nodes for checkpoint solver

* [autoparallel] polish code

* [autoparallel] restore some node handlers

* colossalai/auto_parallel/passes/meta_info_prop.py

* [autoparallel] remove some unused import

* [autoparallel] hook bwd_mem_out
2023-01-02 16:25:18 +08:00