ColossalAI

Commit Graph

Author	SHA1	Message	Date
Fazzie-Maqianli	fa97a9cab4	[chatgpt] unnify datasets (#3218 )	2 years ago
Fazzie-Maqianli	4fd4bd9d9a	[chatgpt] support instuct training (#3216 )	2 years ago
Frank Lee	cd142fbefa	[api] implemented the checkpoint io module (#3205 ) * [api] implemented the checkpoint io module * polish code * polish code	2 years ago
ver217	f8289d4221	[lazyinit] combine lazy tensor with dtensor (#3204 ) * [lazyinit] lazy tensor add distribute * [lazyinit] refactor distribute * [lazyinit] add test dist lazy init * [lazyinit] add verbose info for dist lazy init * [lazyinit] fix rnn flatten weight op * [lazyinit] polish test * [lazyinit] polish test * [lazyinit] fix lazy tensor data setter * [lazyinit] polish test * [lazyinit] fix clean * [lazyinit] make materialize inplace * [lazyinit] refactor materialize * [lazyinit] refactor test distribute * [lazyinit] fix requires_grad * [lazyinit] fix tolist after materialization * [lazyinit] refactor distribute module * [lazyinit] polish docstr * [lazyinit] polish lazy init context * [lazyinit] temporarily skip test * [lazyinit] polish test * [lazyinit] add docstr	2 years ago
Yan Fang	189347963a	[auto] fix requirements typo for issue #3125 (#3209 )	2 years ago
Yuanchen	9998d5ef64	[chatgpt]add reward model code for deberta (#3199 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2 years ago
Fazzie-Maqianli	1e1b9d2fea	[chatgpt]support llama (#3070 )	2 years ago
Frank Lee	e3ad88fb48	[booster] implemented the cluster module (#3191 ) * [booster] implemented the cluster module * polish code	2 years ago
YuliangLiu0306	019a847432	[Analyzer] fix analyzer tests (#3197 )	2 years ago
YuliangLiu0306	f57d34958b	[FX] refactor experimental tracer and adapt it with hf models (#3157 ) * pass gpt trace and meta_prop * pass t5 trace and meta_prop * [FX] refactor experimental tracer and adapt it with hf models * pass all mainstream model zoo * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * fix CI * skip tests * fix CI * using packaging version * polish	2 years ago
pgzhang	b429529365	[chatgpt] add supervised learning fine-tune code (#3183 ) * [chatgpt] add supervised fine-tune code * [chatgpt] delete unused code and modified comment code * [chatgpt] use pytorch distributed sampler instead --------- Co-authored-by: zhangpengpeng <zhangpengpeng@joyy.com>	2 years ago
Frank Lee	e7f3bed2d3	[booster] added the plugin base and torch ddp plugin (#3180 ) * [booster] added the plugin base and torch ddp plugin * polish code * polish code * polish code	2 years ago
NatalieC323	e5f668f280	[dreambooth] fixing the incompatibity in requirements.txt (#3190 ) * Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yaml * Update README.md * Update README.md * Delete requirements_colossalai.txt * Update requirements.txt * Update README.md	2 years ago
Zihao	18dbe76cae	[auto-parallel] add auto-offload feature (#3154 ) * add auto-offload feature * polish code * fix syn offload runtime pass bug * add offload example * fix offload testing bug * fix example testing bug	2 years ago
YuliangLiu0306	258b43317c	[hotfix] layout converting issue (#3188 )	2 years ago
YH	80aed29cd3	[zero] Refactor ZeroContextConfig class using dataclass (#3186 )	2 years ago
YH	9d644ff09f	Fix docstr for zero statedict (#3185 )	2 years ago
zbian	7bc0afc901	updated flash attention usage	2 years ago
Frank Lee	085e7f4eff	[test] fixed torchrec registration in model zoo (#3177 ) * [test] fixed torchrec registration in model zoo * polish code * polish code * polish code	2 years ago
NatalieC323	4e921cfbd6	[examples] Solving the diffusion issue of incompatibility issue#3169 (#3170 ) * Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yaml	2 years ago
Frank Lee	a9b8402d93	[booster] added the accelerator implementation (#3159 )	2 years ago
Frank Lee	1ad3a636b1	[test] fixed torchrec model test (#3167 ) * [test] fixed torchrec model test * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
Saurav Maheshkar	20d1c99444	[refactor] update docs (#3174 ) * refactor: README-zh-Hans * refactor: REFERENCE * docs: update paths in README	2 years ago
BlueRum	7548ca5a54	[chatgpt]Reward Model Training Process update (#3133 ) * add normalize function to value_head in bloom rm * add normalization to value_function in gpt_rm * add normalization to value_head of opt_rm * add Anthropic/hh-rlhf dataset * Update __init__.py * Add LogExpLoss in RM training * Update __init__.py * update rm trainer to use acc as target * update example/train_rm * Update train_rm.sh * code style * Update README.md * Update README.md * add rm test to ci * fix tokenier * fix typo * change batchsize to avoid oom in ci * Update test_ci.sh	2 years ago
ver217	1e58d31bb7	[chatgpt] fix trainer generate kwargs (#3166 )	2 years ago
ver217	c474fda282	[chatgpt] fix ppo training hanging problem with gemini (#3162 ) * [chatgpt] fix generation early stopping * [chatgpt] fix train prompts example	2 years ago
ver217	6ae8ed0407	[lazyinit] add correctness verification (#3147 ) * [lazyinit] fix shared module * [tests] add lazy init test utils * [tests] add torchvision for lazy init * [lazyinit] fix pre op fn * [lazyinit] handle legacy constructor * [tests] refactor lazy init test models * [tests] refactor lazy init test utils * [lazyinit] fix ops don't support meta * [tests] lazy init test timm models * [lazyinit] fix set data * [lazyinit] handle apex layers * [tests] lazy init test transformers models * [tests] lazy init test torchaudio models * [lazyinit] fix import path * [tests] lazy init test torchrec models * [tests] update torch version in CI * [tests] revert torch version in CI * [tests] skip lazy init test	2 years ago
binmakeswell	3c01280a56	[doc] add community contribution guide (#3153 ) * [doc] update contribution guide * [doc] update contribution guide * [doc] add community contribution guide	2 years ago
Frank Lee	ed19290560	[booster] implemented mixed precision class (#3151 ) * [booster] implemented mixed precision class * polish code	2 years ago
YuliangLiu0306	ecd643f1e4	[test] add torchrec models to test model zoo (#3139 )	2 years ago
ver217	14a115000b	[tests] model zoo add torchaudio models (#3138 ) * [tests] model zoo add torchaudio models * [tests] refactor torchaudio wavernn * [tests] refactor fx torchaudio tests	2 years ago
Frank Lee	6d48eb0560	[test] added transformers models to test model zoo (#3135 )	2 years ago
Frank Lee	a674c63348	[test] added torchvision models to test model zoo (#3132 ) * [test] added torchvision models to test model zoo * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
HELSON	1216d1e7bd	[tests] diffuser models in model zoo (#3136 ) * [tests] diffuser models in model zoo * remove useless code * [tests] add diffusers to requirement-test	2 years ago
Saurav Maheshkar	1a46e71e07	[docker] Add opencontainers image-spec to `Dockerfile` (#3006 ) * feat(docker): Add opencontainers image-spec to `Dockerfile` This PR makes few changes to improve the overall quality of the docker image 🐳 . For reference more annotations can be found [here](https://github.com/opencontainers/image-spec/blob/main/annotations.md) * feat(docker): add inline version declaration * fix(docker): drop `org.opencontainers.image.version` LABEL	2 years ago
YuliangLiu0306	2eca4cd376	[DTensor] refactor dtensor with new components (#3089 ) * [DTensor] refactor dtensor with new components * polish	2 years ago
ver217	ed8f60b93b	[lazyinit] refactor lazy tensor and lazy init ctx (#3131 ) * [lazyinit] refactor lazy tensor and lazy init ctx * [lazyinit] polish docstr * [lazyinit] polish docstr	2 years ago
Frank Lee	86ac782d7c	[test] added timm models to test model zoo (#3129 ) * [test] added timm models to test model zoo * polish code * polish code * polish code * polish code * polish code	2 years ago
BlueRum	23cd5e2ccf	[chatgpt]update ci (#3087 ) * [chatgpt]update ci * Update test_ci.sh * Update test_ci.sh * Update test_ci.sh * test * Update train_prompts.py * Update train_dummy.py * add save_path * polish * add save path * polish * add save path * polish * delete bloom-560m test delete bloom-560m test because of oom * add ddp test	2 years ago
Frank Lee	169ed4d24e	[workflow] purged extension cache before GPT test (#3128 )	2 years ago
Xuanlei Zhao	30dd13c450	[autochunk] support complete benchmark (#3121 ) * refact memory code * dont log free var memory * add memory align * update chunk target * update setting for new memory * finish test * update tracer * update typo * update test * add unet test * add bench * update bench * update bench * init * support vit * move to cpu * add cpu benchmark	2 years ago
BlueRum	68577fbc43	[chatgpt]Fix examples (#3116 ) * fix train_dummy * fix train-prompts	2 years ago
BlueRum	0672b5afac	[chatgpt] fix lora support for gpt (#3113 ) * fix gpt-actor * fix gpt-critic * fix opt-critic	2 years ago
github-actions[bot]	0aa92c0409	Automated submodule synchronization (#3105 ) Co-authored-by: github-actions <github-actions@github.com>	2 years ago
Jeff Rasley	453f7ae5a0	prevent op_builder being installed in site-pkgs (#3104 )	2 years ago
hiko2MSP	191daf7411	[chatgpt] type miss of kwargs (#3107 )	2 years ago
binmakeswell	145ccfd7d1	[doc] add Intel cooperation for biomedicine (#3108 ) * [doc] add Intel cooperation for biomedicine	2 years ago
BlueRum	c9dd036592	[chatgpt] fix lora save bug (#3099 ) * fix colo-stratergy * polish * fix lora * fix ddp * polish * polish	2 years ago
binmakeswell	018936a3f3	[tutorial] update notes for TransformerEngine (#3098 )	2 years ago
Kirthi Shankar Sivamani	65a4dbda6c	[NVIDIA] Add FP8 example using TE (#3080 ) Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>	2 years ago

... 3 4 5 6 7 ...

2350 Commits (f5c425c89874f2500600be71b3c9aadad2da822f) All Branches Search

2350 Commits (f5c425c89874f2500600be71b3c9aadad2da822f)

All Branches