Commit Graph

  • 953d3a341c
    Merge b9ac0a6808 into dac0e07b13 #6081 flybird11111 2024-10-14 11:50:10 +0800
  • e1e86f9f1f fix wangbluo 2024-10-14 11:45:35 +0800
  • 6fef58bf50 improve method arg names, add more perf. comments and TODOs Edenzzzz 2024-10-12 21:23:34 +0000
  • 3273eb52ab update Edenzzzz 2024-10-12 19:42:15 +0000
  • 4a6d73ef67 update Edenzzzz 2024-10-12 19:29:47 +0000
  • cba8b33faa update Edenzzzz 2024-10-11 16:49:09 +0000
  • 4c8e85ee0d
    [Coati] Train DPO using PP (#6054) Tong Li 2024-10-11 19:32:00 +0800
  • 703bb5c18d fix the test wangbluo 2024-10-11 17:34:20 +0800
  • 4e0e99bb6a fix the test wangbluo 2024-10-11 17:31:40 +0800
  • 0ca16d5cbe [fix] fix llama, mixtral benchmark zbv loss none bug; update mixtral & llama policy and modeling; duanjunwen 2024-10-11 07:32:43 +0000
  • 1507a7528f fix wangbluo 2024-10-11 06:20:34 +0000
  • 0002ae5956 fix wangbluo 2024-10-11 14:16:21 +0800
  • dac0e07b13
    [zero bubble] support zero (#6080) flybird11111 2024-10-11 14:14:05 +0800
  • d71660217d polish code #6054 Tong Li 2024-10-11 05:47:48 +0000
  • dc2cdaf3e8
    [shardformer] optimize seq parallelism (#6086) Hongxin Liu 2024-10-11 13:44:40 +0800
  • 4bd3d5fc53 fix #6080 flybird11111 2024-10-11 03:17:43 +0000
  • 205d364af2 [shardformer] fix gpt2 fused linear reduce #6086 ver217 2024-10-10 19:12:39 +0800
  • efe3042bb2 fix wangbluo 2024-10-10 18:38:47 +0800
  • 64a43beab9 [test] update gpt2 fused linear test ver217 2024-10-10 17:34:16 +0800
  • 4ede27432c [plugin] update moe hybrid plugin ver217 2024-10-10 17:32:58 +0800
  • f2e9a1ad86 [plugin] update gemini plugin ver217 2024-10-10 17:31:55 +0800
  • b50f0ad158 [shardformer] fix gpt2 fused linear col ver217 2024-10-10 17:30:51 +0800
  • 6b2c506fc5
    Update README.md (#6087) 梁爽 2024-10-10 17:02:49 +0800
  • bcbd311bc3
    Update README.md #6087 supercooledith-patch-1 梁爽 2024-10-10 16:52:55 +0800
  • 60d3ff11a0 [shardformer] optimize seq parallelism ver217 2024-10-10 15:44:32 +0800
  • 5ecc27e150 fix wangbluo 2024-10-10 15:35:52 +0800
  • f98384aef6 fix wangbluo 2024-10-10 15:17:06 +0800
  • e234dfa236 [feat] support MixtralPipelineForwards--> mixtral_for_causal_lm_forward for zbv duanjunwen 2024-10-10 06:57:35 +0000
  • 646b3c5a90
    [shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) Hongxin Liu 2024-10-10 14:34:45 +0800
  • 72b507a7be [feat] update MixtralPipelineForwards --> mixtral_model_forward; support zbv; duanjunwen 2024-10-10 06:19:51 +0000
  • 9ee80fc828 [fix] MixtralForCausalLMPolicy get_held_layer support zbv; duanjunwen 2024-10-10 05:40:22 +0000
  • 1d726399b3 [tp] fix gather fused and add fused linear row #6084 ver217 2024-10-10 10:45:50 +0800
  • 9557d28fe0 update Edenzzzz 2024-10-09 17:07:36 +0000
  • 12bac5e039 improve comments Edenzzzz 2024-10-09 15:25:38 +0000
  • c823dc82e9 improve comments Edenzzzz 2024-10-09 15:24:08 +0000
  • b6e5f9da2e [tp] fix gpt2 mlp policy ver217 2024-10-09 16:41:56 +0800
  • 71f6c14cfd [tp] support sp for fused linear ver217 2024-10-09 15:51:21 +0800
  • e78689faa2 [tp] support uneven split for fused linear ver217 2024-10-09 15:43:22 +0800
  • d46c8b2b06 [tp] hotfix linear row ver217 2024-10-09 15:43:00 +0800
  • b635dd0669 fix wangbluo 2024-10-09 14:05:26 +0800
  • 3f5bec8dc4 [feat] support zbv in mixtral benchmark; #6082 duanjunwen 2024-10-09 03:58:01 +0000
  • 3532f77b90 fix wangbluo 2024-10-09 10:57:19 +0800
  • 531773ff54
    Merge pull request #6077 from duanjunwen/dev/zero_bubble duanjunwen 2024-10-09 10:22:14 +0800
  • b9ac0a6808 fix #6081 flybird11111 2024-10-08 13:08:13 +0000
  • d50c0a1e0a example support zbv flybird11111 2024-10-08 11:25:08 +0000
  • 42f2d0b1ff suport zbv all flybird11111 2024-10-08 11:14:23 +0000
  • 97b146a1b3 fix flybird11111 2024-10-08 10:31:53 +0000
  • 035f87ca69 fix flybird11111 2024-10-08 10:30:55 +0000
  • cc500b3e25 [fix] fix mixtral policy; #6077 duanjunwen 2024-10-08 09:34:09 +0000
  • 292a504bea [fix] fix mixtral policy; duanjunwen 2024-10-08 09:25:11 +0000
  • a5f06706cb zbv support zero flybird11111 2024-10-08 09:02:18 +0000
  • 1637c14af8 Merge branch 'feature/zerobubble' of github.com:flybird11111/ColossalAI into feature/zerobubble flybird11111 2024-10-08 08:56:56 +0000
  • f4d023ca6e Merge branch 'feature/zerobubble' of github.com:hpcaitech/ColossalAI into dev/zero_bubble duanjunwen 2024-10-08 08:13:17 +0000
  • 295dd2d9fe
    [zerobubble] rebase main (#6075) flybird11111 2024-10-08 15:58:00 +0800
  • ceadef35d5 fix #6073 wangbluo 2024-10-07 10:29:58 +0800
  • 6975c50f78 [fix] fix build ci; duanjunwen 2024-09-30 02:34:54 +0000
  • 5c8bbf63a8 [feat] update optimizer bwd; ä¸ duanjunwen 2024-09-29 09:59:41 +0000
  • d63479553c [feat] zerobubble support moehybridplugin; duanjunwen 2024-09-29 08:33:55 +0000
  • 797d1ed6c9 [pre-commit.ci] auto fixes from pre-commit.com hooks #6075 pre-commit-ci[bot] 2024-09-29 07:57:38 +0000
  • 3251e68049
    Merge branch 'feature/zerobubble' into feature/zerobubble flybird11111 2024-09-29 15:56:54 +0800
  • 076794800d [plugin] hybrid support zero bubble pipeline (#6060) flybird11111 2024-09-27 14:48:55 +0800
  • 993f3db3dc [fix] fix fwd branch, fwd pass both micro_batch & internal_inputs' duanjunwen 2024-09-20 07:34:43 +0000
  • 9e90356175 [fix] fix mem assert; duanjunwen 2024-09-19 08:27:47 +0000
  • df12ae7324 [fix] fix model zoo import; duanjunwen 2024-09-09 06:39:33 +0000
  • 78ed432ba4 [fix] fix mem; use a new model shape; only assert mem less and equal than theo; duanjunwen 2024-09-09 06:38:31 +0000
  • 93b3604f85 [fix] fix mem assertation duanjunwen 2024-09-09 05:41:39 +0000
  • e666f5c6db [fix] fix mem check; duanjunwen 2024-09-04 10:57:38 +0000
  • 7ba031dae0 [fix] fix bwd step if condition; remove useless comments and format info; duanjunwen 2024-09-03 08:56:08 +0000
  • 4420dc130a [fix] rm output.data after send fwd; duanjunwen 2024-09-03 14:12:17 +0800
  • 355a3afd02 [fix] fix optim bwd; duanjunwen 2024-09-03 02:40:26 +0000
  • fe99ca329f [fix] fix optim bwd; duanjunwen 2024-09-02 11:19:42 +0000
  • 262b27e37e [feat] update test; rm comments; duanjunwen 2024-09-02 09:50:47 +0000
  • 4ac0d6ef70 [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict; duanjunwen 2024-08-30 05:42:43 +0000
  • 93ede6b5c6 [feat] fix optimizer bwd b & w; support return accum loss & output duanjunwen 2024-08-29 08:54:45 +0000
  • 21bf510862 [feat] add optim backward_b_by_grad duanjunwen 2024-08-29 03:16:59 +0000
  • 0055c473e5 [fix] fix poc test; add comments in poc; duanjunwen 2024-08-28 05:47:53 +0000
  • 49d68ebd01 [feat] fix poc format duanjunwen 2024-08-28 03:08:35 +0000
  • d44e7e698d [feat] add test run_fwd_bwd automatic scheduling; duanjunwen 2024-08-26 11:21:56 +0000
  • 28ee5a761c [update] update text; duanjunwen 2024-08-26 04:00:51 +0000
  • 21c62b6e64 [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble; duanjunwen 2024-08-22 10:25:34 +0000
  • 2fd9d3e26a [plugin] hybrid support zero bubble pipeline (#6060) flybird11111 2024-09-27 14:48:55 +0800
  • 4d3eaee48c [fix] fix test_pipeline_utils ci; duanjunwen 2024-09-26 07:18:16 +0000
  • a3a797da20 [fix] fix zerobubble; support shardformer model type; duanjunwen 2024-09-26 06:11:56 +0000
  • 8bc8bb0b14 [fix] fix pipeline util func deallocate --> release_tensor_data; fix bwd_b loss bwd branch; duanjunwen 2024-09-20 09:48:35 +0000
  • 78a439ba5e [fix] fix fwd branch, fwd pass both micro_batch & internal_inputs' duanjunwen 2024-09-20 07:34:43 +0000
  • f8d6f9853e [fix] fix mem assert; duanjunwen 2024-09-19 08:27:47 +0000
  • 8ce22ae955 [fix] fix require_grad & deallocate call; duanjunwen 2024-09-19 05:53:03 +0000
  • 3e2f2601de [fix] fix zerobubble pp for shardformer type input; duanjunwen 2024-09-18 07:14:34 +0000
  • 9094cc35dd [feat] moehybrid support zerobubble; duanjunwen 2024-09-12 02:51:46 +0000
  • 2683d26b71 [fix] fix model zoo import; duanjunwen 2024-09-09 06:39:33 +0000
  • e80179c401 [fix] fix mem; use a new model shape; only assert mem less and equal than theo; duanjunwen 2024-09-09 06:38:31 +0000
  • ae4cf5b883 [fix] fix mem assertation duanjunwen 2024-09-09 05:41:39 +0000
  • 0825700e0f [fix] fix mem check; duanjunwen 2024-09-04 10:57:38 +0000
  • 497d545525 [fix] fix bwd step if condition; remove useless comments and format info; duanjunwen 2024-09-03 08:56:08 +0000
  • 4249a368d9 [fix] rm output.data after send fwd; duanjunwen 2024-09-03 14:12:17 +0800
  • f347591b06 [fix] fix optim bwd; duanjunwen 2024-09-03 02:40:26 +0000
  • ad8ad64b4a [fix] fix optim bwd; duanjunwen 2024-09-02 11:19:42 +0000
  • cc5e7dcfa2 [fix] rm zbv in hybridplugin duanjunwen 2024-09-02 10:00:43 +0000
  • 94a12f6ece [feat] update test; rm comments; duanjunwen 2024-09-02 09:50:47 +0000
  • 5df5965b2e [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict; duanjunwen 2024-08-30 05:42:43 +0000