Commit Graph

31 Commits (8ff7d0c78048b4231e1f772a8227282ae7d5822a)

Author SHA1 Message Date
Guangyao Zhang f20b066c59
[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059)
* all_gather only internode, fix pytest

* fix cuda arch <89 compile pytest error

* fix pytest failure

* disable all_gather_into_tensor_flat_fp8

* fix fp8 format

* fix pytest

* fix conversations

* fix chunk tuple to list
2024-09-14 10:40:01 +08:00
Hanks 5ce6dd75bf
[fp8] disable all_to_all_fp8 in intranode (#6045)
* enhance all_to_all_fp8 with internode comm control

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disable some fp8 ops due to performance issue

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-09 13:47:17 +08:00
Hongxin Liu c3b5caff0e
[fp8] optimize all-gather (#6043)
* [fp8] optimize all-gather

* [fp8] fix all gather fp8 ring

* [fp8] enable compile

* [fp8] fix all gather fp8 ring
2024-09-03 15:45:17 +08:00
Guangyao Zhang e96a0761ea
[FP8] unsqueeze scale to make it compatible with torch.compile (#6040) 2024-08-29 14:49:23 +08:00
botbw 1a2e90dcc1 [fp8] linear perf enhancement 2024-08-15 13:43:08 +08:00
botbw 88fa096d78
[fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) 2024-08-15 10:14:42 +08:00
flybird11111 597b206001
[fp8] support asynchronous FP8 communication (#5997)
* fix

* fix

* fix

* support async all2all

* support async op for all gather

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-14 14:08:19 +08:00
Hongxin Liu 0978080a69
[fp8] refactor fp8 linear with compile (#5993)
* [fp8] refactor fp8 linear with compile

* [fp8] fix linear test

* [fp8] fix linear test
2024-08-13 16:07:26 +08:00
flybird11111 f1a3a326c4
[fp8]Moe support fp8 communication (#5977)
* fix

* support moe fp8

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* fix

fix

fi

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-09 18:26:02 +08:00
botbw e4aadeee20
[fp8] use torch compile (torch >= 2.3.0) (#5979)
* [fp8] use torch compile (torch >= 2.4.0)

* [fp8] set use_fast_accum in linear

* [chore] formal version check

* [chore] fix sig
2024-08-09 15:51:06 +08:00
Hongxin Liu 8241c0c054
[fp8] support gemini plugin (#5978)
* [fp8] refactor hook

* [fp8] support gemini plugin

* [example] add fp8 option for llama benchmark
2024-08-09 14:09:48 +08:00
Hanks b480eec738
[Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928)
* support fp8_communication in the Torch DDP grad comm, FSDP grad comm, and FSDP params comm

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implement communication hook for FSDP params all-gather

* added unit test for fp8 operators

* support fp8 communication in GeminiPlugin

* update training scripts to support fsdp and fp8 communication

* fixed some minor bugs observed in unit test

* add all_gather_into_tensor_flat_fp8

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add skip the test if torch < 2.2.0

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add skip the test if torch < 2.2.0

* add skip the test if torch < 2.2.0

* add fp8_comm flag

* rebase latest fp8 operators

* rebase latest fp8 operators

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-08 15:55:01 +08:00
flybird11111 7739629b9d
fix (#5976) 2024-08-07 18:58:39 +08:00
Hongxin Liu ccabcf6485
[fp8] support fp8 amp for hybrid parallel plugin (#5975)
* [fp8] support fp8 amp for hybrid parallel plugin

* [test] add fp8 hook test

* [fp8] fix fp8 linear compatibility
2024-08-07 18:21:08 +08:00
Hongxin Liu 76ea16466f
[fp8] add fp8 linear (#5967)
* [fp8] add fp8 linear

* [test] fix fp8 linear test condition

* [test] fix fp8 linear test condition

* [test] fix fp8 linear test condition
2024-08-07 15:41:49 +08:00
flybird11111 afb26de873
[fp8]support all2all fp8 (#5953)
* support all2all fp8

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-06 16:58:23 +08:00
Guangyao Zhang 53cb9606bd
[Feature] llama shardformer fp8 support (#5938)
* add llama shardformer fp8

* Llama Shardformer Parity

* fix typo

* fix all reduce

* fix pytest failure

* fix reduce op and move function to fp8.py

* fix typo
2024-08-05 10:05:47 +08:00
Hongxin Liu 5fd0592767
[fp8] support all-gather flat tensor (#5932) 2024-07-24 16:55:20 +08:00
GuangyaoZhang 6a20f07b80 remove all to all 2024-07-17 07:14:55 +00:00
GuangyaoZhang 5a310b9ee1 fix rebase 2024-07-17 03:43:23 +00:00
GuangyaoZhang 457a0de79f shardformer fp8 2024-07-16 06:56:51 +00:00
pre-commit-ci[bot] 51f916b11d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-07-12 07:33:45 +00:00
BurkeHulk 1f1b856354 Merge remote-tracking branch 'origin/feature/fp8_comm' into feature/fp8_comm
# Conflicts:
#	colossalai/quantization/fp8.py
2024-07-12 15:29:41 +08:00
BurkeHulk e88190184a support fp8 communication in pipeline parallelism 2024-07-12 15:25:25 +08:00
BurkeHulk 1e1959467e fix scaling algorithm in FP8 casting 2024-07-12 15:23:37 +08:00
GuangyaoZhang dbfa7d39fc fix typo 2024-07-10 08:13:26 +00:00
pre-commit-ci[bot] e17f835df7 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-07-04 12:47:17 +00:00
Hanks 6991819a97
Merge branch 'hpcaitech:main' into feature/fp8_comm 2024-07-04 20:34:41 +08:00
Hongxin Liu 7afbc81d62
[quant] fix bitsandbytes version check (#5882)
* [quant] fix bitsandbytes version check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-04 11:33:23 +08:00
HangXu f5a52e1600
fp8 operators for compressed communication
cast_to_fp8, cast_from_fp8, all_reduce_fp8
2024-07-01 13:44:21 +08:00
linsj20 91fa553775 [Feature] qlora support (#5586)
* [feature] qlora support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* qlora follow commit

* migrate qutization folder to colossalai/

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-28 10:51:27 +08:00