Hongxin Liu
|
19e1a5cf16
|
[shardformer] update colo attention to support custom mask (#5510)
* [feature] refactor colo attention (#5462)
* [extension] update api
* [feature] add colo attention
* [feature] update sdpa
* [feature] update npu attention
* [feature] update flash-attn
* [test] add flash attn test
* [test] update flash attn test
* [shardformer] update modeling to fit colo attention (#5465)
* [misc] refactor folder structure
* [shardformer] update llama flash-attn
* [shardformer] fix llama policy
* [devops] update tensornvme install
* [test] update llama test
* [shardformer] update colo attn kernel dispatch
* [shardformer] update blip2
* [shardformer] update chatglm
* [shardformer] update gpt2
* [shardformer] update gptj
* [shardformer] update opt
* [shardformer] update vit
* [shardformer] update colo attention mask prep
* [shardformer] update whisper
* [test] fix shardformer tests (#5514)
* [test] fix shardformer tests
* [test] fix shardformer tests
|
2024-03-27 11:19:32 +08:00 |
Hongxin Liu
|
df63564184
|
[gemini] support amp o3 for gemini (#4872)
* [gemini] support no reuse fp16 chunk
* [gemini] support no master weight for optim
* [gemini] support no master weight for gemini ddp
* [test] update gemini tests
* [test] update gemini tests
* [plugin] update gemini plugin
* [test] fix gemini checkpointio test
* [test] fix gemini checkpoint io
|
2023-10-12 10:39:08 +08:00 |
Hongxin Liu
|
079bf3cb26
|
[misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
|
2023-09-19 14:20:26 +08:00 |
Baizhou Zhang
|
58913441a1
|
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141)
* [checkpointio] unsharded optimizer checkpoint for Gemini plugin
* [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
|
2023-07-07 16:33:06 +08:00 |
Frank Lee
|
58df720570
|
[shardformer] adapted T5 and LLaMa test to use kit (#4049)
* [shardformer] adapted T5 and LLaMa test to use kit
* polish code
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
d857f3dbba
|
[shardformer] supported T5 and its variants (#4045)
|
2023-07-04 16:05:01 +08:00 |
jiangmingyan
|
20068ba188
|
[booster] add tests for ddp and low level zero's checkpointio (#3715)
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update booster tutorials#3717, fix recursive check
|
2023-05-10 12:17:02 +08:00 |
HELSON
|
5d3a2be3af
|
[amp] add gradient clipping for unit tests (#2283)
* [amp] add gradient clipping in unit tests
* fix bugs
|
2023-01-04 11:59:56 +08:00 |
Super Daniel
|
8328917348
|
[NFC] polish colossalai/testing/comparison.py code style. (#1558)
|
2022-09-08 22:11:04 +08:00 |
Frank Lee
|
b72b8445c6
|
optimized context test time consumption (#446)
|
2022-03-17 14:40:52 +08:00 |
Frank Lee
|
bffd85bf34
|
added testing module (#435)
|
2022-03-16 17:20:05 +08:00 |