Commit Graph

1728 Commits (6a3086a5055235e51a1bca8a20c4bd967409a259)

Author SHA1 Message Date
digger yu 1baeb39c72
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742)
2 years ago
wukong1992 b37797ed3d
[booster] support torch fsdp plugin in booster (#3697)
2 years ago
digger-yu ad6460cf2c
[NFC] fix typo applications/ and colossalai/ (#3735)
2 years ago
digger-yu b7141c36dd
[CI] fix some spelling errors (#3707)
2 years ago
jiangmingyan 20068ba188
[booster] add tests for ddp and low level zero's checkpointio (#3715)
2 years ago
Hongxin Liu 6552cbf8e1
[booster] fix no_sync method (#3709)
2 years ago
Hongxin Liu 3bf09efe74
[booster] update prepare dataloader method for plugin (#3706)
2 years ago
Hongxin Liu f83ea813f5
[example] add train resnet/vit with booster example (#3694)
2 years ago
YH 2629f9717d
[tensor] Refactor handle_trans_spec in DistSpecManager
2 years ago
Hongxin Liu d0915f54f4
[booster] refactor all dp fashion plugins (#3684)
2 years ago
jiangmingyan 307894f74d
[booster] gemini plugin support shard checkpoint (#3610)
2 years ago
YH a22407cc02
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173)
2 years ago
Hongxin Liu 50793b35f4
[gemini] accelerate inference (#3641)
2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594)
2 years ago
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618)
2 years ago
Hongxin Liu 12eff9eb4c
[gemini] state dict supports fp16 (#3590)
2 years ago
Hongxin Liu dac127d0ee
[fx] fix meta tensor registration (#3589)
2 years ago
Hongxin Liu f313babd11
[gemini] support save state dict in shards (#3581)
2 years ago
YH d329c294ec
Add docstr for zero3 chunk search utils (#3572)
2 years ago
Hongxin Liu 173dad0562
[misc] add verbose arg for zero and op builder (#3552)
2 years ago
Hongxin Liu 4341f5e8e6
[lazyinit] fix clone and deepcopy (#3553)
2 years ago
Hongxin Liu 152239bbfa
[gemini] gemini supports lazy init (#3379)
2 years ago
jiangmingyan 366a035552
[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479)
2 years ago
YH bcf0cbcbe7
[doc] Add docs for clip args in zero optim (#3504)
2 years ago
jiangmingyan 52a933e175
[checkpoint] support huggingface style sharded checkpoint (#3461)
2 years ago
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
2 years ago
Frank Lee 7d8d825681
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442)
2 years ago
YH 8f740deb53
Fix typo (#3448)
2 years ago
Hakjin Lee 46c009dba4
[format] Run lint on colossalai.engine (#3367)
2 years ago
YuliangLiu0306 ffcdbf0f65
[autoparallel]integrate auto parallel feature with new tracer (#3408)
2 years ago
ver217 573af84184
[example] update examples related to zero/gemini (#3431)
2 years ago
Frank Lee 1beb85cc25
[checkpoint] refactored the API and added safetensors support (#3427)
2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
2 years ago
Frank Lee 638a07a7f9
[test] fixed gemini plugin test (#3411)
2 years ago
ver217 5f2e34e6c9
[booster] implement Gemini plugin (#3352)
2 years ago
HELSON 1a1d68b053
[moe] add checkpoint for moe models (#3354)
2 years ago
YuliangLiu0306 fee2af8610
[autoparallel] adapt autoparallel with new analyzer (#3261)
2 years ago
Ofey Chan 8706a8c66c
[NFC] polish colossalai/engine/gradient_handler/__init__.py code style (#3329)
2 years ago
yuxuan-lou 198a74b9fd
[NFC] polish colossalai/context/random/__init__.py code style (#3327)
2 years ago
YuliangLiu0306 fbd2a9e05b [hotfix] meta_tensor_compatibility_with_torch2
2 years ago
Michelle ad285e1656
[NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323)
2 years ago
Xu Kai 64350029fe [NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style
2 years ago
RichardoLuo 1ce9d0c531 [NFC] polish initializer_data.py code style (#3287)
2 years ago
Ziheng Qin 1bed38ef37 [NFC] polish colossalai/cli/benchmark/models.py code style (#3290)
2 years ago
Kai Wang (Victor Kai) 964a28678f [NFC] polish initializer_3d.py code style (#3279)
2 years ago
Sze-qq 94eec1c5ad [NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277)
2 years ago
Arsmart1 8af977f223 [NFC] polish colossalai/context/parallel_context.py code style (#3276)
2 years ago
Zirui Zhu 1168b50e33 [NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275)
2 years ago
Tong Li 196d4696d0 [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274)
2 years ago
lucasliunju 4b95464994 [NFC] polish colossalai/amp/__init__.py code style (#3272)
2 years ago
Xuanlei Zhao 6b3bb2c249 [NFC] polish code style (#3273)
2 years ago
CZYCW 4cadb25b96 [NFC] policy colossalai/fx/proxy.py code style (#3269)
2 years ago
Yuanchen d58fa705b2 [NFC] polish code style (#3268)
2 years ago
Camille Zhong c4a226b729 [NFC] polish tensor_placement_policy.py code style (#3265)
2 years ago
CsRic 00778abc48 [NFC] polish colossalai/fx/passes/split_module.py code style (#3263)
2 years ago
jiangmingyan 488f37048c [NFC] polish colossalai/global_variables.py code style (#3259)
2 years ago
LuGY 1ff7d5bfa5 [NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260)
2 years ago
dayellow 204ca2f09a [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256)
2 years ago
HELSON 02b058032d
[fx] meta registration compatibility (#3253)
2 years ago
Frank Lee 73d3e4d309
[booster] implemented the torch ddd + resnet example (#3232)
2 years ago
YH 1a229045af
Add interface for colo tesnor dp size (#3227)
2 years ago
YuliangLiu0306 4d5d8f98a4
[API] implement device mesh manager (#3221)
2 years ago
Frank Lee cd142fbefa
[api] implemented the checkpoint io module (#3205)
2 years ago
ver217 f8289d4221
[lazyinit] combine lazy tensor with dtensor (#3204)
2 years ago
Frank Lee e3ad88fb48
[booster] implemented the cluster module (#3191)
2 years ago
YuliangLiu0306 f57d34958b
[FX] refactor experimental tracer and adapt it with hf models (#3157)
2 years ago
Frank Lee e7f3bed2d3
[booster] added the plugin base and torch ddp plugin (#3180)
2 years ago
Zihao 18dbe76cae
[auto-parallel] add auto-offload feature (#3154)
2 years ago
YuliangLiu0306 258b43317c
[hotfix] layout converting issue (#3188)
2 years ago
YH 80aed29cd3
[zero] Refactor ZeroContextConfig class using dataclass (#3186)
2 years ago
YH 9d644ff09f
Fix docstr for zero statedict (#3185)
2 years ago
zbian 7bc0afc901 updated flash attention usage
2 years ago
Frank Lee a9b8402d93
[booster] added the accelerator implementation (#3159)
2 years ago
ver217 6ae8ed0407
[lazyinit] add correctness verification (#3147)
2 years ago
Frank Lee ed19290560
[booster] implemented mixed precision class (#3151)
2 years ago
YuliangLiu0306 2eca4cd376
[DTensor] refactor dtensor with new components (#3089)
2 years ago
ver217 ed8f60b93b
[lazyinit] refactor lazy tensor and lazy init ctx (#3131)
2 years ago
Frank Lee 95a36eae63
[kernel] added kernel loader to softmax autograd function (#3093)
2 years ago
Super Daniel fff98f06ed
[analyzer] a minimal implementation of static graph analyzer (#2852)
2 years ago
Xuanlei Zhao 10c61de2f7
[autochunk] support vit (#3084)
2 years ago
YuliangLiu0306 8e4e8601b7
[DTensor] implement layout converter (#3055)
2 years ago
Frank Lee f19b49e164
[booster] init module structure and definition (#3056)
2 years ago
Xuanlei Zhao 2ca9728cbb
[autochunk] refactor chunk memory estimation (#2762)
2 years ago
YuliangLiu0306 29386a54e6
[DTensor] refactor CommSpec (#3034)
2 years ago
YuliangLiu0306 cd2b0eaa8d
[DTensor] refactor sharding spec (#2987)
2 years ago
Ziyue Jiang 400f63012e
[pipeline] Add Simplified Alpa DP Partition (#2507)
2 years ago
Super Daniel b42d3d28ed
[fx] remove depreciated algorithms. (#2312) (#2313)
2 years ago
github-actions[bot] 82503a96f2
[format] applied code formatting on changed files in pull request 2997 (#3008)
2 years ago
binmakeswell 52a5078988
[doc] add ISC tutorial (#2997)
2 years ago
ver217 823f3b9cf4
[doc] add deepspeed citation and copyright (#2996)
2 years ago
YuliangLiu0306 e414e4092b
[DTensor] implementation of dtensor (#2946)
2 years ago
YuliangLiu0306 47fb214b3b
[hotfix] add shard dim to aviod backward communication error (#2954)
2 years ago
ver217 090f14fd6b
[misc] add reference (#2930)
2 years ago
YuliangLiu0306 197d0bf4ed
[autoparallel] apply repeat block to reduce solving time (#2912)
2 years ago
YH a848091141
Fix port exception type (#2925)
2 years ago
zbian 61e687831d fixed using zero with tp cannot access weight correctly
2 years ago
YH 7b13f7db18
[zero] trivial zero optimizer refactoring (#2869)
2 years ago
Jiatong (Julius) Han 8c8a39be95
[hotfix]: Remove math.prod dependency (#2837)
2 years ago
YuliangLiu0306 819e25d8b1
[hotfix] fix autoparallel compatibility test issues (#2754)
2 years ago
YuliangLiu0306 0f392d7403
[autoparallel] find repeat blocks (#2854)
2 years ago
junxu c52edcf0eb
Rename class method of ZeroDDP (#2692)
2 years ago
HELSON 6e4ac08172
[hotfix] fix chunk size can not be divided (#2867)
2 years ago
Boyuan Yao eae77c831d
[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823)
2 years ago
Boyuan Yao c7764d3f22
[autoparallel] Patch meta information of `torch.where` (#2822)
2 years ago
Boyuan Yao fcc4097efa
[autoparallel] Patch meta information of `torch.tanh()` and `torch.nn.Dropout` (#2773)
2 years ago
Frank Lee 935346430f
[cli] handled version check exceptions (#2848)
2 years ago
Frank Lee 918bc94b6b
[triton] added copyright information for flash attention (#2835)
2 years ago
Boyuan Yao 7ea6bc7f69
[autoparallel] Patch tensor related operations meta information (#2789)
2 years ago
Michelle c008d4ad0c
[NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744)
2 years ago
YuliangLiu0306 2059fdd6b0
[hotfix] add copyright for solver and device mesh (#2803)
2 years ago
Boyuan Yao 8593ae1a3f
[autoparallel] rotor solver refactor (#2813)
2 years ago
HELSON 56ddc9ca7a
[hotfix] add correct device for fake_param (#2796)
2 years ago
Boyuan Yao a2b43e393d
[autoparallel] Patch meta information of `torch.nn.Embedding` (#2760)
2 years ago
Boyuan Yao 8e3f66a0d1
[zero] fix wrong import (#2777)
2 years ago
Nikita Shulga 01066152f1
Don't use `torch._six` (#2775)
2 years ago
binmakeswell 93b788b95a Merge branch 'main' into fix/format
2 years ago
xyupeng 2fd528b9f4
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2737)
2 years ago
YuliangLiu0306 1dc003c169
[autoparallel] distinguish different parallel strategies (#2699)
2 years ago
YH ae86a29e23
Refact method of grad store (#2687)
2 years ago
Zirui Zhu c9e3ee389e
[NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726)
2 years ago
Zangwei Zheng 1819373e5c
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/batch_norm_handler.py code style (#2728)
2 years ago
Wangbo Zhao(黑色枷锁) 8331420520
[NFC] polish colossalai/cli/cli.py code style (#2734)
2 years ago
ziyuhuang123 d344313533
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/embedding_handler.py code style (#2725)
2 years ago
Xue Fuzhao e81caeb4bc
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/cost_graph.py code style (#2720)
2 years ago
yuxuan-lou 51c45c2460
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/where_handler.py code style (#2723)
2 years ago
YuliangLiu0306 21d6a48f4d
[autoparallel] add shard option (#2696)
2 years ago
YuliangLiu0306 5b24987fa7
[autoparallel] fix parameters sharding bug (#2716)
2 years ago
Ziyue Jiang 4603538ddd
[NFC] posh colossalai/context/process_group_initializer/initializer_sequence.py code style (#2712)
2 years ago
YuliangLiu0306 cb2c6a2415
[autoparallel] refactor runtime pass (#2644)
2 years ago
Zihao b3d10db5f1
[NFC] polish colossalai/cli/launcher/__init__.py code style (#2709)
2 years ago
YuliangLiu0306 0b2a738393
[autoparallel] remove deprecated codes (#2664)
2 years ago
YuliangLiu0306 7fa6be49d2
[autoparallel] test compatibility for gemini and auto parallel (#2700)
2 years ago
CZYCW 4ac8bfb072
[NFC] polish colossalai/engine/gradient_handler/utils.py code style (#2708)
2 years ago
Liu Ziming 6427c406cf
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/strategy_generator.py code style (#2695)
2 years ago
アマデウス 534f68c83c
[NFC] polish pipeline process group code style (#2694)
2 years ago
LuGY 56ff1921e9
[NFC] polish colossalai/context/moe_context.py code style (#2693)
2 years ago
Shawn-Kong 1712da2800
[NFC] polish colossalai/gemini/gemini_context.py code style (#2690)
2 years ago
HELSON df4f020ee3
[zero1&2] only append parameters with gradients (#2681)
2 years ago
ver217 f0aa191f51
[gemini] fix colo_init_context (#2683)
2 years ago
Boyuan Yao 40c916b192
[autoparallel] Patch meta information of `torch.nn.functional.softmax` and `torch.nn.Softmax` (#2674)
2 years ago
HELSON 8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671)
2 years ago
binmakeswell 9ab14b20b5
[doc] add CVPR tutorial (#2666)
2 years ago
Boyuan Yao 0385b26ebf
[autoparallel] Patch meta information of `torch.nn.LayerNorm` (#2647)
2 years ago
YuliangLiu0306 37df666f38
[autoparallel] refactor handlers which reshape input tensors (#2615)
2 years ago
YuliangLiu0306 28398f1c70
add overlap option (#2613)
2 years ago
YuliangLiu0306 cb3d1bef62
[autoparallel] adapt autoparallel tests with latest api (#2626)
2 years ago
Boyuan Yao 90a9fdd91d
[autoparallel] Patch meta information of `torch.matmul` (#2584)
2 years ago
oahzxl 6ba8364881
[autochunk] support diffusion for autochunk (#2621)
2 years ago
Frank Lee 8518263b80
[test] fixed the triton version for testing (#2608)
2 years ago
HELSON 552183bb74
[polish] polish ColoTensor and its submodules (#2537)
2 years ago
Frank Lee dd14783f75
[kernel] fixed repeated loading of kernels (#2549)
2 years ago
ver217 5b1854309a
[hotfix] fix zero ddp warmup check (#2545)
2 years ago
oahzxl fa3d66feb9
support unet metainfo prop (#2544)
2 years ago
oahzxl 05671fcb42
[autochunk] support multi outputs chunk search (#2538)
2 years ago
oahzxl 63199c6687
[autochunk] support transformer (#2526)
2 years ago
HELSON a4ed9125ac
[hotfix] fix lightning error (#2529)
2 years ago
HELSON 66dfcf5281
[gemini] update the gpt example (#2527)
2 years ago
HELSON b528eea0f0
[zero] add zero wrappers (#2523)
2 years ago
Super Daniel c198c7c0b0
[hotfix] meta tensor default device. (#2510)
2 years ago
HELSON 077a5cdde4
[zero] fix gradient clipping in hybrid parallelism (#2521)
2 years ago
YuliangLiu0306 aa0f6686f9
[autoparallel] accelerate gpt2 training (#2495)
2 years ago
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
2 years ago
HELSON 2d1a7dfe5f
[zero] add strict ddp mode (#2508)
2 years ago
oahzxl c04f183237
[autochunk] support parsing blocks (#2506)
2 years ago
Super Daniel 35c0c0006e
[utils] lazy init. (#2148)
2 years ago
oahzxl 72341e65f4
[auto-chunk] support extramsa (#3) (#2504)
2 years ago
Ziyue Jiang 0f02b8c6e6
add avg partition (#2483)
2 years ago
アマデウス 99d9713b02 Revert "Update parallel_context.py (#2408)"
2 years ago
oahzxl ecccc91f21
[autochunk] support autochunk on evoformer (#2497)
2 years ago
oahzxl 5db3a5bf42
[fx] allow control of ckpt_codegen init (#2498)
2 years ago
HELSON d565a24849
[zero] add unit testings for hybrid parallelism (#2486)
2 years ago
oahzxl 4953b4ace1
[autochunk] support evoformer tracer (#2485)
2 years ago
YuliangLiu0306 67e1912b59
[autoparallel] support origin activation ckpt on autoprallel system (#2468)
2 years ago
Ziyue Jiang fef5c949c3
polish pp middleware (#2476)
2 years ago
HELSON a5dc4253c6
[zero] polish low level optimizer (#2473)
2 years ago
Frank Lee 8b7495dd54
[example] integrate seq-parallel tutorial with CI (#2463)
2 years ago
Jiarui Fang 867c8c2d3a
[zero] low level optim supports ProcessGroup (#2464)
2 years ago
Frank Lee 14d9299360
[cli] fixed hostname mismatch error (#2465)
2 years ago
Haofan Wang 9358262992
Fix False warning in initialize.py (#2456)
2 years ago
YuliangLiu0306 8221fd7485
[autoparallel] update binary elementwise handler (#2451)
2 years ago
HELSON 2bfeb24308
[zero] add warning for ignored parameters (#2446)
2 years ago
Frank Lee 39163417a1
[example] updated the hybrid parallel tutorial (#2444)
2 years ago
HELSON 5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443)
2 years ago
YuliangLiu0306 2731531bc2
[autoparallel] integrate device mesh initialization into autoparallelize (#2393)
2 years ago
Frank Lee c72c827e95
[cli] provided more details if colossalai run fail (#2442)
2 years ago
Super Daniel c41e59e5ad
[fx] allow native ckpt trace and codegen. (#2438)
2 years ago
YuliangLiu0306 41429b9b28
[autoparallel] add shard option (#2423)
2 years ago
HELSON 7829aa094e
[ddp] add is_ddp_ignored (#2434)
2 years ago
HELSON bb4e9a311a
[zero] add inference mode and its unit test (#2418)
2 years ago
Jiarui Fang 93f62dd152
[autochunk] add autochunk feature
2 years ago
HELSON dddacd2d2c
[hotfix] add norm clearing for the overflow step (#2416)
2 years ago
oahzxl 7ab2db206f adapt new fx
2 years ago
oahzxl e532679c95 Merge branch 'main' of https://github.com/oahzxl/ColossalAI into chunk
2 years ago
Haofan Wang 7d5640b9db
Update parallel_context.py (#2408)
2 years ago
oahzxl fd818cf144 change imports
2 years ago
oahzxl a591d45b29 add available
2 years ago
oahzxl 615e7e68d9 update doc
2 years ago
oahzxl 7d4abaa525 add doc
2 years ago
oahzxl 1be0ac3cbf add doc for trace indice
2 years ago
oahzxl 0b6af554df remove useless function
2 years ago
oahzxl d914a21d64 rename
2 years ago
oahzxl 865f2e0196 rename
2 years ago
HELSON ea13a201bb
[polish] polish code for get_static_torch_model (#2405)
2 years ago
oahzxl a4ed5b0d0d rename in doc
2 years ago
oahzxl 1bb1f2ad89 rename
2 years ago
oahzxl cb9817f75d rename function from index to indice
2 years ago
oahzxl 0ea903b94e rename trace_index to trace_indice
2 years ago
Frank Lee 551cafec14
[doc] updated kernel-related optimisers' docstring (#2385)
2 years ago
oahzxl 065f0b4c27 add doc for search
2 years ago
oahzxl a68d240ed5 add doc for search chunk
2 years ago
oahzxl 1951f7fa87 code style
2 years ago
oahzxl 212b5b1b5f add comments
2 years ago
oahzxl 19cc64b1d3 remove autochunk_available
2 years ago
eric8607242 9880fd2cd8
Fix state_dict key missing issue of the ZeroDDP (#2363)
2 years ago
oahzxl 4d223e18a2 fix typo
2 years ago
Frank Lee ce08661eb1
[cli] updated installation check cli for aot/jit build (#2395)
2 years ago
jiaruifang 69d9180c4b [hotfix] issue #2388
2 years ago
Jiarui Fang 4e96039649
[device] find best logical mesh
2 years ago
Jiarui Fang 8f72b6f8fb
[hotfix] fix implement error in diffusers
2 years ago
Frank Lee 40d376c566
[setup] support pre-build and jit-build of cuda kernels (#2374)
2 years ago
1SAA 33f3023e19 [hotfix] fix implement error in diffusers
2 years ago
Jiarui Fang 12c8bf38d7
[Pipeline] Refine GPT PP Example
2 years ago
oahzxl 8a989a0d89 code style
2 years ago
oahzxl c3a2bf48b4 code style
2 years ago
oahzxl a6cdbf9161 seperate trace flow
2 years ago
oahzxl 4748967fb1 ad reorder graph
2 years ago
oahzxl da4076846d rename
2 years ago
oahzxl c3d72f7db9 seperate reorder
2 years ago
binmakeswell a881d6d000
Revert "[NFC] polish code format" (#2372)
2 years ago
Ziyue Jiang 9ae9e74017 fix diff device in some partition
2 years ago
Jiarui Fang 0dcc410f57
[NFC] polish code format
2 years ago
oahzxl 6685a9d022 seperate non chunk input
2 years ago
binmakeswell d634eae05b
Revert "[NFC] polish code format (#2367)" (#2371)
2 years ago
oahzxl f856611d21 seperate prepose_nodes
2 years ago
Shawn-Kong d42aecdda1
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/embedding_handler.py code style (#2368)
2 years ago
Jiarui Fang 1aaeb596c6
[example] gpt, shard init on all processes (#2366)
2 years ago
oahzxl f4a1607e56 seperate input node dim search
2 years ago
binmakeswell 1f8ab6f1f5
[NFC] polish code format (#2367)
2 years ago
oahzxl ae27a8b26d seperate flow tracer
2 years ago
oahzxl fd87d78a28 rename ambiguous variable
2 years ago
oahzxl 2bde9d2b7f code format
2 years ago
oahzxl 8a634af2f5 close mem and code print
2 years ago
oahzxl 1a6d2a740b take apart chunk code gen
2 years ago
ExtremeViscent ac0d30fe2e
[NFC] polish batch_norm_handler.py code style (#2359)
2 years ago
HELSON 48d33b1b17
[gemini] add get static torch model (#2356)
2 years ago
oahzxl efb1c64c30 restruct dir
2 years ago
ziyuhuang123 7080a8edb0
[workflow]New version: Create workflow files for examples' auto check (#2298)
2 years ago
LuGY e11a005c02
[NFC] polish colossalai/auto_parallel/tensor_shard/utils/factory.py code style (#2349)
2 years ago
YuliangLiu0306 b5a3a4a65f [device] find best logical mesh
2 years ago
yuxuan-lou 28e2d16794
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2340)
2 years ago
YuliangLiu0306 9c9246c0d9
[device] alpha beta profiler (#2311)
2 years ago
Maruyama_Aya bd12a49e2a
[NFC] polish <colossalai/auto_parallel/tensor_shard/deprecated/constants.py> code style (#2339)
2 years ago
Zihao 35427bcab4
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/unary_elementwise_handler.py code style (#2326)
2 years ago
Jiarui Fang db6eea3583
[builder] reconfig op_builder for pypi install (#2314)
2 years ago
Junming Wu 4a79c10750 [NFC] polish colossalai/cli/benchmark/__init__.py code style (#2308)
2 years ago
Ofey Chan 87d2defda6 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/layer_norm_handler.py code style (#2305)
2 years ago
ver217 116e3d0b8f [NFC] polish communication/p2p_v2.py code style (#2303)
2 years ago
xyupeng b965585d05 [NFC] polish colossalai/amp/torch_amp/torch_amp.py code style (#2290)
2 years ago
Zangwei Zheng d1e5bafcd4 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/__init__.py code style (#2291)
2 years ago
shenggan 950685873f [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/reshape_handler.py code style (#2292)
2 years ago
Ziheng Qin 3041014089 [NFC] polish colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py code style (#2299)
2 years ago
アマデウス 49715a78f0 [NFC] polish colossalai/cli/benchmark/benchmark.py code style (#2287)
2 years ago
Zirui Zhu 1c29b173c9 [NFC] polish colossalai/auto_parallel/tensor_shard/node_handler/getitem_handler.py code style (#2289)
2 years ago
Zihao 3a02b46447
[auto-parallel] refactoring ColoTracer (#2118)
2 years ago
HELSON 5d3a2be3af
[amp] add gradient clipping for unit tests (#2283)
2 years ago
Boyuan Yao d45695d94e
Merge pull request #2258 from hpcaitech/debug/ckpt-autoparallel
2 years ago
Jiarui Fang 16cc8e6aa7
[builder] MOE builder (#2277)
2 years ago
Boyuan Yao b904748210
[autoparallel] bypass MetaInfo when unavailable and modify BCAST_FUNC_OP metainfo (#2293)
2 years ago
Super Daniel 8ea50d999e
[hotfix] pass a parameter. (#2288)
2 years ago
zbian e94c79f15b improved allgather & reducescatter for 3d
2 years ago
HELSON 62c38e3330
[zero] polish low level zero optimizer (#2275)
2 years ago
Ziyue Jiang ac863a01d6
[example] add benchmark (#2276)
2 years ago
Boyuan Yao 22e947f982
[autoparallel] fix runtime apply memory estimation (#2281)
2 years ago
Super Daniel 8e8900ff3f
[autockpt] considering parameter and optimizer weights. (#2279)
2 years ago
YuliangLiu0306 f027ef7913
[hotfix] fix fp16 optimzier bug (#2273)
2 years ago
YuliangLiu0306 fb87322773
[autoparallel] fix spelling error (#2270)
2 years ago
Jiarui Fang af32022f74
[Gemini] fix the convert_to_torch_module bug (#2269)
2 years ago
Super Daniel b0d21d0c4f
[autockpt] linearize / merge shape-consistency nodes. (#2271)
2 years ago
YuliangLiu0306 4b29112ab2
[autoparallel] gpt2 autoparallel examples (#2267)
2 years ago
Ziyue Jiang 8b045b3c1f
[Pipeline Middleware] Reduce comm redundancy by getting accurate output (#2232)
2 years ago
Boyuan Yao 5c2ef9fc76
[autoparallel] modify comm nodes' memory cost in construct chain (#2263)
2 years ago
Boyuan Yao 1ea99b869e
[autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline (#2261)
2 years ago
Super Daniel 3ccf58aa76
[autockpt] make it work. (#2257)
2 years ago
Boyuan Yao ac3739930d
[autoparallel] modify construct chain in rotor solver (#2254)
2 years ago
Boyuan Yao ab38aebace
[autoparallel] Hook all meta information on ResNet nodes for auto activation checkpoint (#2248)
2 years ago
Boyuan Yao c8c79102f0
[autoparallel] patch torch.flatten metainfo for autoparallel (#2247)
2 years ago
YuliangLiu0306 8897b8f753
[autoparallel] autoparallel initialize (#2238)
2 years ago
xcnick 85178a397a
[hotfix] fix error for torch 2.0 (#2243)
2 years ago
Super Daniel b7d0990c61
[autoparallel] fix construct meta info. (#2245)
2 years ago
Ziyue Jiang 57929a6210
fix type of num_worker_threads (#2237)
2 years ago
Jiarui Fang db4cbdc7fb
[builder] builder for scaled_upper_triang_masked_softmax (#2234)
2 years ago
Super Daniel 78483a9fdd
[logger] hotfix, missing _FORMAT (#2231)
2 years ago
Jiarui Fang 54de05da5d
[builder] polish builder with better base class (#2216)
2 years ago
YuliangLiu0306 3b1b91eaf4
[autoparallel] record parameter attribute in colotracer (#2217)
2 years ago
Jiarui Fang 7675792100
[builder] raise Error when CUDA_HOME is not set (#2213)
2 years ago
Jiarui Fang d5e3e3ec01
[example] update gpt example for larger model scale (#2211)
2 years ago
Boyuan Yao 24246f7aa5
[autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162)
2 years ago
Boyuan Yao d0bc5a1b34
[autoparallel] new metainfoprop based on metainfo class (#2179)
2 years ago
YuliangLiu0306 78509124d3
[autoparallel] update getitem handler (#2207)
2 years ago
Jiarui Fang 1cb532ffec
[builder] multihead attn runtime building (#2203)
2 years ago
Tongping Liu 8e22c38b89
[hotfix] Fixing the bug related to ipv6 support
2 years ago
YuliangLiu0306 4851f2d607
[autoparallel] update_getattr_handler (#2193)
2 years ago
Jiarui Fang 5682e6d346
[hotfix] correcnt cpu_optim runtime compilation (#2197)
2 years ago
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
2 years ago
Jiarui Fang 355ffb386e
[builder] unified cpu_optim fused_optim inferface (#2190)
2 years ago
Jiarui Fang 9587b080ba
[builder] use runtime builder for fused_optim (#2189)
2 years ago
Jiarui Fang bc0e271e71
[buider] use builder() for cpu adam and fused optim in setup.py (#2187)
2 years ago
Jiarui Fang d42afd30f8
[builder] runtime adam and fused_optim builder (#2184)
2 years ago
YuliangLiu0306 550f8f8905
[autoparallel] integrate_gpt_related_tests (#2134)
2 years ago
Ziyue Jiang 59e343328d
[Pipeline Middleware ] Fix deadlock when num_microbatch=num_stage (#2156)
2 years ago
Tongping Liu ab54fed292
[hotfix] add kwargs for colo_addmm (#2171)
2 years ago
アマデウス 622f863291
[hotfix] Jit type hint #2161 (#2164)
2 years ago
Zihao 12e7bcd720
register meta func for rnn (#2159)
2 years ago
Boyuan Yao cfe2a9bd90
[autoparallel] memory estimation for shape consistency (#2144)
2 years ago
Jiarui Fang b87496a66b
[hotfix] fix auto policy of test_sharded_optim_v2 (#2157)
2 years ago
YuliangLiu0306 16335cb537
[hotfix] fix aten default bug (#2158)
2 years ago
HELSON a7d95b7024
[example] add zero1, zero2 example in GPT examples (#2146)
2 years ago
YuliangLiu0306 1cce6e36ca
[autoparallel] use metainfo in handler (#2149)
2 years ago
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151)
2 years ago
Jiarui Fang bdef9dfdbe
[NFC] remove useless graph node code (#2150)
2 years ago
BlueRum b3f73ce1c8
[Gemini] Update coloinit_ctx to support meta_tensor (#2147)
2 years ago
Zihao a128eec9d5
register aten._convolution.default (#2137)
2 years ago
Jiarui Fang ee287620f0
[Gemini] revert ZeROInitCtx related tracer (#2138)
2 years ago
アマデウス 077a66dd81
updated attention kernel (#2133)
2 years ago
YuliangLiu0306 a3c6924deb
[autoparallel] process size nodes in runtime pass (#2130)
2 years ago
YuliangLiu0306 536560ccc0
[autoparallel] implement softmax handler (#2132)
2 years ago
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129)
2 years ago
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128)
2 years ago
Jiarui Fang 8fac837679
[Gemini] update non model data calculation method (#2126)
2 years ago
Jiarui Fang 5efda69735
[Gemini] hotfix the unittest bugs (#2125)
2 years ago
Jiarui Fang 05bb28aacf
[Gemini] mapping of preop timestep and param (#2124)
2 years ago
YuliangLiu0306 cd0af9f7f6
[autoparallel] gpt2lp runtimee test (#2113)
2 years ago
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115)
2 years ago
HELSON e7d3afc9cc
[optimizer] add div_scale for optimizers (#2117)
2 years ago
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119)
2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116)
2 years ago
Ziyue Jiang 09d69e1c25
[PP Middleware] Add bwd and step for PP middleware (#2111)
2 years ago
Jiarui Fang 8afc001f4f
[Gemini] chunk init use OrderedParamGenerator (#2110)
2 years ago
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
2 years ago
Jiarui Fang 70a8556946
[gemini] get the param visited order during runtime (#2108)
2 years ago
Jiarui Fang 61f31c3cf0
[Gemini] NFC, polish search_chunk_configuration (#2107)
2 years ago
Jiarui Fang 8e14344ec9
[hotfix] fix a type in ColoInitContext (#2106)
2 years ago
Jiarui Fang 05545bfee9
[ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105)
2 years ago
YuliangLiu0306 d87baa85d9
[autoparallel] support linear function bias addition (#2104)
2 years ago
YuliangLiu0306 0fecbb9e20
[autoparallel] support addbmm computation (#2102)
2 years ago
YuliangLiu0306 d3d4630495
[autoparallel] add sum handler (#2101)
2 years ago
Ziyue Jiang e4705ba4e2
[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087)
2 years ago
YuliangLiu0306 b175e6d58e
[autoparallel] add bias addtion function class (#2098)
2 years ago
YuliangLiu0306 3af7e65dea
[autoparallel] complete gpt related module search (#2097)
2 years ago
Jiarui Fang 85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) (#2099)
2 years ago
Super Daniel 2bf2d1cd3b
[fx] An experimental version of ColoTracer.' (#2002)
2 years ago
Jiarui Fang 4b055351b0
[Gemini] make RuntimeMemTracer work correctly (#2096)
2 years ago
YuliangLiu0306 7f72eb0510
[autoparallel]add embedding handler (#2089)
2 years ago
Jiarui Fang 1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091)
2 years ago
Jiarui Fang 28e55c2530
[Gemini] remove GLOBAL_CUDA_MEM_INFO (#2090)
2 years ago
Jiarui Fang 25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer (#2088)
2 years ago
Jiarui Fang 33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084)
2 years ago
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083)
2 years ago
YuliangLiu0306 0e9db368ef
[autoparallel] add tensor constructor handler (#2082)
2 years ago
YuliangLiu0306 cdf537a648
[autoparallel] add non_split linear strategy (#2078)
2 years ago
Boyuan Yao cf0268da93
[autoparallel] Add F.conv metainfo (#2069)
2 years ago
YuliangLiu0306 f123476666
[autoparallel] complete gpt block searching (#2065)
2 years ago
Ziyue Jiang 597cdd3006
[Pipeline Middleware] Adapt scheduler for Topo (#2066)
2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
2 years ago
YuliangLiu0306 677e1e20d4
[device] update flatten device mesh usage (#2079)
2 years ago
Jiarui Fang a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer (#2076)
2 years ago
Jiarui Fang 223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073)
2 years ago
Jiarui Fang 9f828ef36f
[Gemini] remove not used MemtracerWrapper (#2072)
2 years ago
Boyuan Yao 616da17fab
[autoparallel] add binary elementwise metainfo for auto parallel (#2058)
2 years ago
Boyuan Yao 4b40fbd743
[autoparallel] fix forward memory calculation (#2062)
2 years ago
Ziyue Jiang 44ea461890
[Pipeline] Add Topo Class (#2059)
2 years ago
YuliangLiu0306 e4293e5077
[hotfix] update test for latest version (#2060)
2 years ago
Zihao 38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue (#2052)
2 years ago
YuliangLiu0306 1c1fe44305
[autoparallel] adapt solver with self attention (#2037)
2 years ago
Frank Lee ea74a3b9cc
[cli] updated installation cheheck with more inforamtion (#2050)
2 years ago
HELSON f6178728a0
[gemini] fix init bugs for modules (#2047)
2 years ago
Frank Lee 81e0da7fa8
[setup] supported conda-installed torch (#2048)
2 years ago
HELSON e37f3db40c
[gemini] add arguments (#2046)
2 years ago
Zihao 6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040)
2 years ago
Jiarui Fang 31c644027b
[hotfix] hotfix Gemini for no leaf modules bug (#2043)
2 years ago
HELSON a1ce02d740
[zero] test gradient accumulation (#1964)
2 years ago
Ziyue Jiang b0936e4a44
[rpc] split with dag (#2028)
2 years ago
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035)
2 years ago
YuliangLiu0306 0dbcd4a6f5
[autoparallel] add split handler (#2032)
2 years ago
Jiarui Fang 28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034)
2 years ago
YuliangLiu0306 81330b0352
[autoparallel] add experimental permute handler (#2029)
2 years ago
Zihao 95c4532fff
[Gemini] paramWrapper paramTracerHook unitest (#2030)
2 years ago
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003)
2 years ago
Ziyue Jiang 632753abbc
[fx]Split partition with DAG information (#2025)
2 years ago
YuliangLiu0306 ea0f6b8df9
[autoparallel] add runtime pass and numerical test for view handler (#2018)
2 years ago
Zihao a719b89a41
[gemini] param_trace_hook (#2020)
2 years ago
Jiarui Fang 0b0d8f9e17
[hotfix] revert bug PRs (#2016)
2 years ago
Zihao aba3db464d
[Gemini] ParamMemHook (#2008)
2 years ago
Zihao 0160a62a3c
[Gemini] param_tracer_wrapper and test case (#2009)
2 years ago
YuliangLiu0306 1438993113
[autoparallel] add experimental view handler (#2011)
2 years ago
Genghan Zhang d655eea515
[autoparallel] mix gather (#1977)
2 years ago
Frank Lee 2bab6f512c
[release] release v0.1.11rc4 (#2007)
2 years ago
Boyuan Yao 6cd784ffee
[autoparallel] Add metainfo support for F.linear (#1987)
2 years ago
Super Daniel 2edbef13cc
[fx] add more meta_registry for MetaTensor execution. (#2000)
2 years ago
Jiarui Fang a2d3266648
[hotfix] make Gemini work for conv DNN (#1998)
2 years ago