ColossalAI/colossalai/booster/plugin
Wenhao Chen 1810b9100f [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
* test: add more p2p tests

* fix: remove send_forward_recv_forward as p2p op list need to use the same group

* fix: make send and receive atomic

* feat: update P2PComm fn

* feat: add metadata cache in 1f1b

* feat: add metadata cache in interleaved pp

* feat: modify is_xx_stage fn

* revert: add _broadcast_object_list

* feat: add interleaved pp in llama policy

* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
2024-01-05 13:58:53 +08:00
..
__init__.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
dp_plugin_base.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
gemini_plugin.py [gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) 2023-12-08 11:10:51 +08:00
hybrid_parallel_plugin.py [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) 2024-01-05 13:58:53 +08:00
low_level_zero_plugin.py [npu] add npu support for gemini and zero (#5067) 2023-11-20 16:12:41 +08:00
moe_hybrid_parallel_plugin.py update 2023-12-18 10:37:07 +08:00
plugin_base.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
pp_plugin_base.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
torch_ddp_plugin.py [doc] polish shardformer doc (#4779) 2023-09-26 10:57:47 +08:00
torch_fsdp_plugin.py [doc] polish shardformer doc (#4779) 2023-09-26 10:57:47 +08:00