ColossalAI

Commit Graph

Author	SHA1	Message	Date
hxwang	154720ba6e	[chore] refactor profiler utils	2024-05-28 12:41:42 +00:00
genghaozhe	87665d7922	correct argument help message	2024-05-27 06:03:53 +00:00
genghaozhe	b9269d962d	add args.prefetch_num for benchmark	2024-05-25 14:55:50 +00:00
genghaozhe	fba04e857b	[bugs] fix args.profile=False DummyProfiler errro	2024-05-25 14:55:09 +00:00
hxwang	ca674549e0	[chore] remove unnecessary test & changes	2024-05-24 06:09:36 +00:00
hxwang	ff507b755e	Merge branch 'main' of github.com:hpcaitech/ColossalAI into prefetch	2024-05-24 04:05:07 +00:00
hxwang	63c057cd8e	[example] add profile util for llama	2024-05-24 03:59:36 +00:00
botbw	2fc85abf43	[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713 ) * [gemini] async grad chunk reduce (all-reduce&reduce-scatter) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [gemini] add test * [gemini] rename func * [gemini] update llama benchmark * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [gemini] use tensor counter * [gemini] change default config in GeminiPlugin and GeminiDDP * [chore] typo * [gemini] fix sync issue & add test cases * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-24 10:31:16 +08:00
hxwang	15d21a077a	Merge remote-tracking branch 'origin/main' into prefetch	2024-05-23 15:49:33 +00:00
Yuanheng Zhao	8633c15da9	[sync] Sync feature/colossal-infer with main	2024-05-20 15:50:53 +00:00
genghaozhe	a280517dd9	remove unrelated file	2024-05-20 05:25:35 +00:00
genghaozhe	df63db7e63	remote comments	2024-05-20 05:15:51 +00:00
hxwang	2e68eebdfe	[chore] refactor & sync	2024-05-16 07:22:10 +00:00
Yuanheng Zhao	12e7c28d5e	[hotfix] fix OpenMOE example import path (#5697 )	2024-05-08 15:48:47 +08:00
Yuanheng Zhao	55cc7f3df7	[Fix] Fix Inference Example, Tests, and Requirements (#5688 ) * clean requirements * modify example inference struct * add test ci scripts * mark test_infer as submodule * rm deprecated cls & deps * import of HAS_FLASH_ATTN * prune inference tests to be run * prune triton kernel tests * increment pytest timeout mins * revert import path in openmoe	2024-05-08 11:30:15 +08:00
Edenzzzz	c25f83c85f	fix missing pad token (#5690 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu>	2024-05-06 18:17:26 +08:00
Yuanheng Zhao	56ed09aba5	[sync] resolve conflicts of merging main	2024-05-05 05:14:00 +00:00
Hongxin Liu	7f8b16635b	[misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-29 10:40:11 +08:00
Tong Li	68ec99e946	[hotfix] add soft link to support required files (#5661 )	2024-04-26 21:12:04 +08:00
Hongxin Liu	1b387ca9fe	[shardformer] refactor pipeline grad ckpt config (#5646 ) * [shardformer] refactor pipeline grad ckpt config * [shardformer] refactor pipeline grad ckpt config * [pipeline] fix stage manager	2024-04-25 15:19:30 +08:00
傅剑寒	279300dc5f	[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613 ) * refactor compilation mechanism and unified multi hw * fix file path bug * add init.py to make pybind a module to avoid relative path error caused by softlink * delete duplicated micros * fix micros bug in gcc	2024-04-24 14:17:54 +08:00
binmakeswell	f4c5aafe29	[example] llama3 (#5631 ) * release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3	2024-04-23 18:48:07 +08:00
Hongxin Liu	4de4e31818	[exampe] update llama example (#5626 ) * [plugin] support dp inside for hybriad parallel * [example] update llama benchmark * [example] update llama benchmark * [example] update llama readme * [example] update llama readme	2024-04-23 14:12:20 +08:00
Edenzzzz	d83c633ca6	[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606 ) * fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu>	2024-04-18 18:15:50 +08:00
Hongxin Liu	641b1ee71a	[devops] remove post commit ci (#5566 ) * [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-08 15:09:40 +08:00
digger yu	341263df48	[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548 )	2024-04-07 19:04:58 +08:00
digger yu	a799ca343b	[fix] fix typo s/muiti-node /multi-node etc. (#5448 )	2024-04-07 18:42:15 +08:00
Wenhao Chen	e614aa34f3	[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508 ) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests	2024-04-01 11:34:58 +08:00
Yuanheng Zhao	36c4bb2893	[Fix] Grok-1 use tokenizer from the same pretrained path (#5532 ) * [fix] use tokenizer from the same pretrained path * trust remote code	2024-03-28 16:30:04 +08:00
Insu Jang	00525f7772	[shardformer] fix pipeline forward error if custom layer distribution is used (#5189 ) * Use self.[distribute_layers\|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-03-27 13:57:00 +08:00
Yuanheng Zhao	131f32a076	[fix] fix grok-1 example typo (#5506 )	2024-03-26 10:19:42 +08:00
binmakeswell	34e909256c	[release] grok-1 inference benchmark (#5500 ) * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark	2024-03-25 14:42:51 +08:00
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	2024-03-25 12:31:09 +08:00
Yuanheng Zhao	5fcd7795cd	[example] update Grok-1 inference (#5495 ) * revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url	2024-03-24 20:24:11 +08:00
binmakeswell	6df844b8c4	[release] grok-1 314b inference (#5490 ) * [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference	2024-03-22 15:48:12 +08:00
Hongxin Liu	848a574c26	[example] add grok-1 inference (#5485 ) * [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme	2024-03-21 18:07:22 +08:00
Luo Yihang	e239cf9060	[hotfix] fix typo of openmoe model source (#5403 )	2024-03-05 21:44:38 +08:00
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	2024-03-05 15:35:54 +08:00
flybird11111	29695cf70c	[example]add gpt2 benchmark example script. (#5295 ) * benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (#5250) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (#5272) * [workflow] fixed oom tests (#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) * fix auto loading gpt2 tokenizer (#5279) * [doc] add llama2-13B disyplay (#5285) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Wenhao Chen <cwher@outlook.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by: Desperado-Jia <502205863@qq.com>	2024-03-04 16:18:13 +08:00
Hongxin Liu	d882d18c65	[example] reuse flash attn patch (#5400 )	2024-02-27 11:22:07 +08:00
digger yu	71321a07cf	fix typo change dosen't to doesn't (#5308 )	2024-01-30 09:57:38 +08:00
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	2024-01-29 13:49:39 +08:00
flybird11111	f7e3f82a7e	fix llama pretrain (#5287 )	2024-01-19 17:49:02 +08:00
ver217	148469348a	Merge branch 'main' into sync/npu	2024-01-18 12:05:21 +08:00
Wenhao Chen	ef4f0ee854	[hotfix]: add pp sanity check and fix mbs arg (#5268 ) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check	2024-01-15 15:57:40 +08:00
binmakeswell	c174c4fc5f	[doc] fix doc typo (#5256 ) * [doc] fix annotation display * [doc] fix llama2 doc	2024-01-11 21:01:11 +08:00
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	2024-01-09 10:20:05 +08:00
Xuanlei Zhao	dd2c28a323	[npu] use extension for op builder (#5172 ) * update extension * update cpu adam * update is * add doc for cpu adam * update kernel * update commit * update flash * update memory efficient * update flash attn * update flash attention loader * update api * fix * update doc * update example time limit * reverse change * fix doc * remove useless kernel * fix * not use warning * update * update	2024-01-08 11:39:16 +08:00
Wenhao Chen	3c0d82b19b	[pipeline]: support arbitrary batch size in forward_only mode (#5201 ) * fix: remove drop last in val & test dataloader * feat: add run_forward_only, support arbitrary bs * chore: modify ci script	2024-01-02 23:41:12 +08:00
Wenhao Chen	4fa689fca1	[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134 ) * test: add more p2p tests * fix: remove send_forward_recv_forward as p2p op list need to use the same group * fix: make send and receive atomic * feat: update P2PComm fn * feat: add metadata cache in 1f1b * feat: add metadata cache in interleaved pp * feat: modify is_xx_stage fn * revert: add _broadcast_object_list * feat: add interleaved pp in llama policy * feat: set NCCL_BUFFSIZE in HybridParallelPlugin	2023-12-22 10:44:00 +08:00

1 2 3 4

177 Commits (ceba662d22b1f982bf1d56e9699034b4ac97b60e)