ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	7f8b16635b	[misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
binmakeswell	b8a711aa2d	[news] llama3 and open-sora v1.1 (#5655 ) * [news] llama3 and open-sora v1.1 * [news] llama3 and open-sora v1.1	7 months ago
Hongxin Liu	bbb2c21f16	[shardformer] fix chatglm implementation (#5644 ) * [shardformer] fix chatglm policy * [shardformer] fix chatglm flash attn * [shardformer] update readme * [shardformer] fix chatglm init * [shardformer] fix chatglm test * [pipeline] fix chatglm merge batch	7 months ago
binmakeswell	f4c5aafe29	[example] llama3 (#5631 ) * release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3	7 months ago
Hongxin Liu	641b1ee71a	[devops] remove post commit ci (#5566 ) * [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	8 months ago
binmakeswell	34e909256c	[release] grok-1 inference benchmark (#5500 ) * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark	8 months ago
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	8 months ago
binmakeswell	6df844b8c4	[release] grok-1 314b inference (#5490 ) * [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference	8 months ago
binmakeswell	d158fc0e64	[doc] update open-sora demo (#5479 ) * [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo	8 months ago
binmakeswell	bd998ced03	[doc] release Open-Sora 1.0 with model weights (#5468 ) * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights	8 months ago
digger yu	70cce5cbed	[doc] update some translations with README-zh-Hans.md (#5382 )	9 months ago
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	9 months ago
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	9 months ago
binmakeswell	a1c6cdb189	[doc] fix blog link	9 months ago
Frank Lee	705a62a565	[doc] updated installation command (#5389 )	9 months ago
yixiaoer	69e3ad01ed	[doc] Fix typo (#5361 )	9 months ago
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	10 months ago
digger yu	bce9499ed3	fix some typo (#5307 )	10 months ago
ver217	148469348a	Merge branch 'main' into sync/npu	10 months ago
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	11 months ago
binmakeswell	7bc6969ce6	[doc] SwiftInfer release (#5236 ) * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release	11 months ago
binmakeswell	b9b32b15e6	[doc] add Colossal-LLaMA-2-13B (#5234 ) * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B	11 months ago
flybird11111	681d9b12ef	[doc] update pytorch version in documents. (#5177 ) * fix aaa fix fix fix * fix * fix * test ci * fix ci fix * update pytorch version in documents	11 months ago
binmakeswell	177c79f2d1	[doc] add moe news (#5128 ) * [doc] add moe news * [doc] add moe news * [doc] add moe news	1 year ago
Wenhao Chen	7172459e74	[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088 ) * [shardformer] implement policy for all GPT-J models and test * [shardformer] support interleaved pipeline parallel for bert finetune * [shardformer] shardformer support falcon (#4883) * [shardformer]: fix interleaved pipeline for bert model (#5048) * [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093) * Add Mistral support for Shardformer (#5103) * [shardformer] add tests to mistral (#5105) --------- Co-authored-by: Pengtai Xu <henryxu880@gmail.com> Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com> Co-authored-by: flybird11111 <1829166702@qq.com> Co-authored-by: eric8607242 <e0928021388@gmail.com>	1 year ago
digger yu	d5661f0f25	[nfc] fix typo change directoty to directory (#5111 )	1 year ago
digger yu	2bdf76f1f2	fix typo change lazy_iniy to lazy_init (#5099 )	1 year ago
digger yu	0d482302a1	[nfc] fix typo and author name (#5089 )	1 year ago
digger yu	fd3567e089	[nfc] fix typo in docs/ (#4972 )	1 year ago
ppt0011	335cb105e2	[doc] add supported feature diagram for hybrid parallel plugin (#4996 )	1 year ago
digger yu	11009103be	[nfc] fix some typo with colossalai/ docs/ etc. (#4920 )	1 year ago
Baizhou Zhang	21ba89cab6	[gemini] support gradient accumulation (#4869 ) * add test * fix no_sync bug in low level zero plugin * fix test * add argument for grad accum * add grad accum in backward hook for gemini * finish implementation, rewrite tests * fix test * skip stuck model in low level zero test * update doc * optimize communication & fix gradient checkpoint * modify doc * cleaning codes * update cpu adam fp16 case	1 year ago
flybird11111	6a21f96a87	[doc] update advanced tutorials, training gpt with hybrid parallelism (#4866 ) * [doc]update advanced tutorials, training gpt with hybrid parallelism * [doc]update advanced tutorials, training gpt with hybrid parallelism * update vit tutorials * update vit tutorials * update vit tutorials * update vit tutorials * update en/train_vit_with_hybrid_parallel.py * fix * resolve comments * fix	1 year ago
Zhongkai Zhao	db40e086c8	[test] modify model supporting part of low_level_zero plugin (including correspoding docs)	1 year ago
binmakeswell	822051d888	[doc] update slack link (#4823 )	1 year ago
Hongxin Liu	da15fdb9ca	[doc] add lazy init docs (#4808 )	1 year ago
Baizhou Zhang	64a08b2dc3	[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774 ) * support unsharded saving/loading for model * support optimizer unsharded saving * update doc * support unsharded loading for optimizer * small fix	1 year ago
Baizhou Zhang	a2db75546d	[doc] polish shardformer doc (#4779 ) * fix example format in docstring * polish shardformer doc	1 year ago
binmakeswell	d512a4d38d	[doc] add llama2 domain-specific solution news (#4789 ) * [doc] add llama2 domain-specific solution news	1 year ago
Baizhou Zhang	493a5efeab	[doc] add shardformer doc to sidebar (#4768 )	1 year ago
Hongxin Liu	66f3926019	[doc] clean up outdated docs (#4765 ) * [doc] clean up outdated docs * [doc] fix linking * [doc] fix linking	1 year ago
Pengtai Xu	4d7537ba25	[doc] put native colossalai plugins first in description section	1 year ago
Pengtai Xu	e10d9f087e	[doc] add model examples for each plugin	1 year ago
Pengtai Xu	a04337bfc3	[doc] put individual plugin explanation in front	1 year ago
Pengtai Xu	10513f203c	[doc] explain suitable use case for each plugin	1 year ago
Hongxin Liu	b5f9e37c70	[legacy] clean up legacy code (#4743 ) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci	1 year ago
Baizhou Zhang	d151dcab74	[doc] explaination of loading large pretrained models (#4741 )	1 year ago
Baizhou Zhang	451c3465fb	[doc] polish shardformer doc (#4735 ) * arrange position of chapters * fix typos in seq parallel doc	1 year ago
Bin Jia	6a03c933a0	[shardformer] update seq parallel document (#4730 ) * update doc of seq parallel * fix typo	1 year ago
flybird11111	46162632e5	[shardformer] update pipeline parallel document (#4725 ) * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document	1 year ago

1 2 3 4

174 Commits (feat/online-serving)