ColossalAI

Commit Graph

Author	SHA1	Message	Date
duanjunwen	1739df423c	[fix] fix fwd branch, fwd pass both micro_batch & internal_inputs'	2024-09-20 07:34:43 +00:00
duanjunwen	b6616f544e	[fix] rm comments;	2024-09-20 07:29:41 +00:00
duanjunwen	c6d6ee39bd	[fix] use tree_flatten replace dict traverse;	2024-09-20 07:18:49 +00:00
duanjunwen	26783776f1	[fix] fix input_tensors buffer append input_obj(dict) --> Tuple (microbatch, input_obj) , and all bwd b related cal logic;	2024-09-20 06:41:19 +00:00
duanjunwen	a115106f8d	[fix] fix bwd w input;	2024-09-19 08:10:05 +00:00
duanjunwen	349272c71f	[fix] updatw bwd b&w input; dict --> list[torch.Tensor]	2024-09-19 07:47:01 +00:00
duanjunwen	6ee9584b9a	[fix] fix require_grad & deallocate call;	2024-09-19 05:53:03 +00:00
duanjunwen	3dbad102cf	[fix] fix zerobubble pp for shardformer type input;	2024-09-18 07:14:34 +00:00
duanjunwen	ce58d8e8bf	[fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap;	2024-09-09 08:19:58 +00:00
duanjunwen	7568b34626	[fix] fix redundant detach & clone; add buffer assertation in the end;	2024-09-09 08:04:28 +00:00
duanjunwen	e6e1a97a6d	[fix] fix requir grad position and detach position and input&output local buffer append position;	2024-09-04 03:31:08 +00:00
duanjunwen	20503cdfdf	[fix] rm requir_grad for output;	2024-09-03 09:24:40 +00:00
duanjunwen	b4103f125c	[fix] fix detach output & release output;	2024-09-03 09:09:41 +00:00
duanjunwen	4c1f81c683	[fix] fix bwd step if condition; remove useless comments and format info;	2024-09-03 08:56:08 +00:00
duanjunwen	ab643c9af7	[fix] rm output.data after send fwd;	2024-09-03 14:12:17 +08:00
duanjunwen	591a13bf7e	[fix] fix optim bwd;	2024-09-02 11:19:42 +00:00
duanjunwen	6d18d38d5c	[feat] update test; rm comments;	2024-09-02 09:50:47 +00:00
duanjunwen	a7b767b071	[fix] fix communication_map;	2024-08-30 05:56:02 +00:00
duanjunwen	8eb6eac225	[fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict;	2024-08-30 05:42:43 +00:00
duanjunwen	6af81d8c0d	[feat] add fwd_bwd_step, run_fwd_only;	2024-08-30 02:47:52 +00:00
duanjunwen	48ba22dbfd	[feat] fix optimizer bwd b & w; support return accum loss & output	2024-08-29 08:54:45 +00:00
duanjunwen	4c4b01b859	[feat] add optim backward_b_by_grad	2024-08-29 03:16:59 +00:00
duanjunwen	fe209164f1	[feat] add apply v_schedule graph; p & p.grad assert err exist;	2024-08-27 10:29:39 +00:00
duanjunwen	8b37323f16	[feat] add run_fwd_bwd_with_microbatch (replace input) & test; add p&p.grad assert close test & all pass;	2024-08-27 09:31:38 +00:00
duanjunwen	9e0bd1af00	[fix] fix ci test; add pytest;	2024-08-27 08:00:23 +00:00
duanjunwen	283c9ff5d2	[fix] rm useless assign and comments;	2024-08-27 07:31:58 +00:00
duanjunwen	1b4bb2beeb	[feat] add comments for ZBV func;	2024-08-27 07:11:50 +00:00
duanjunwen	5e09c8b4e1	[feat] split communication and calculation; fix pop empty send_bwd_buffer error;	2024-08-27 06:29:13 +00:00
duanjunwen	1d75045c37	[feat] add test run_fwd_bwd automatic scheduling;	2024-08-26 11:21:56 +00:00
duanjunwen	fd5526b76e	Merge branch 'main' into dev/zero_bubble	2024-08-26 04:03:20 +00:00
duanjunwen	c18ef060cf	[feat] add dw test;	2024-08-23 06:04:12 +00:00
duanjunwen	ee9baedadf	[feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble;	2024-08-22 10:25:34 +00:00
Edenzzzz	f5c84af0b0	[Feature] Zigzag Ring attention (#5905 ) * halfway * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sp_mode to benchmark; fix varlen interface * update softmax_lse shape by new interface * change tester name * remove buffer clone; support packed seq layout * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-08-16 13:56:38 +08:00
Edenzzzz	2a25a2aff7	[Feature] optimize PP overlap (#5735 ) * update to fully overlap, still debugging * improve interface * fixed deadlock bug * debug NaN loss * (experimental) use one comm group for send_fw_recv_fw to fix NaN * cleaned up interfaces; use one batch p2p for all * clean up; removed the double p2p batch case * p2p test passsed * improve overlap: send fwd before backward * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tentatively use 2 p2p batches * remove two p2p batches * fix typos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove pp.sh --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: root <root@notebook-c55824c0-7742-45e8-9591-c855bb77ad29-0.notebook-c55824c0-7742-45e8-9591-c855bb77ad29.colossal-ai.svc.cluster.local>	2024-06-26 14:48:02 +08:00
Hongxin Liu	bbb2c21f16	[shardformer] fix chatglm implementation (#5644 ) * [shardformer] fix chatglm policy * [shardformer] fix chatglm flash attn * [shardformer] update readme * [shardformer] fix chatglm init * [shardformer] fix chatglm test * [pipeline] fix chatglm merge batch	2024-04-25 14:41:17 +08:00
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	2024-03-25 12:31:09 +08:00
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	2024-03-12 11:25:16 +08:00
digger yu	16c96d4d8c	[hotfix] fix typo change _descrption to _description (#5331 )	2024-03-05 21:47:48 +08:00
Frank Lee	7cfed5f076	[feat] refactored extension module (#5298 ) * [feat] refactored extension module * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	2024-01-25 17:01:48 +08:00
ver217	148469348a	Merge branch 'main' into sync/npu	2024-01-18 12:05:21 +08:00
Wenhao Chen	ef4f0ee854	[hotfix]: add pp sanity check and fix mbs arg (#5268 ) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check	2024-01-15 15:57:40 +08:00
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	2024-01-09 10:20:05 +08:00
Elsa Granger	d565df3821	[pipeline] A more general _communicate in p2p (#5062 ) * A more general _communicate * feat: finish tree_flatten version p2p * fix: update p2p api calls --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-08 15:37:27 +08:00
Wenhao Chen	d799a3088f	[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214 ) * fix: add fallback order option and update 1f1b * fix: fix deadlock comm in interleaved pp * test: modify p2p test	2024-01-03 11:34:49 +08:00
Wenhao Chen	3c0d82b19b	[pipeline]: support arbitrary batch size in forward_only mode (#5201 ) * fix: remove drop last in val & test dataloader * feat: add run_forward_only, support arbitrary bs * chore: modify ci script	2024-01-02 23:41:12 +08:00
Wenhao Chen	4fa689fca1	[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134 ) * test: add more p2p tests * fix: remove send_forward_recv_forward as p2p op list need to use the same group * fix: make send and receive atomic * feat: update P2PComm fn * feat: add metadata cache in 1f1b * feat: add metadata cache in interleaved pp * feat: modify is_xx_stage fn * revert: add _broadcast_object_list * feat: add interleaved pp in llama policy * feat: set NCCL_BUFFSIZE in HybridParallelPlugin	2023-12-22 10:44:00 +08:00
Wenhao Chen	7172459e74	[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088 ) * [shardformer] implement policy for all GPT-J models and test * [shardformer] support interleaved pipeline parallel for bert finetune * [shardformer] shardformer support falcon (#4883) * [shardformer]: fix interleaved pipeline for bert model (#5048) * [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093) * Add Mistral support for Shardformer (#5103) * [shardformer] add tests to mistral (#5105) --------- Co-authored-by: Pengtai Xu <henryxu880@gmail.com> Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com> Co-authored-by: flybird11111 <1829166702@qq.com> Co-authored-by: eric8607242 <e0928021388@gmail.com>	2023-11-28 16:54:42 +08:00
Hongxin Liu	1cd7efc520	[inference] refactor examples and fix schedule (#5077 ) * [setup] refactor infer setup * [hotfix] fix infenrece behavior on 1 1 gpu * [exmaple] refactor inference examples	2023-11-21 10:46:03 +08:00
Hongxin Liu	e5ce4c8ea6	[npu] add npu support for gemini and zero (#5067 ) * [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support	2023-11-20 16:12:41 +08:00
Xu Kai	fd6482ad8c	[inference] Refactor inference architecture (#5057 ) * [inference] support only TP (#4998) * support only tp * enable tp * add support for bloom (#5008) * [refactor] refactor gptq and smoothquant llama (#5012) * refactor gptq and smoothquant llama * fix import error * fix linear import torch-int * fix smoothquant llama import error * fix import accelerate error * fix bug * fix import smooth cuda * fix smoothcuda * [Inference Refactor] Merge chatglm2 with pp and tp (#5023) merge chatglm with pp and tp * [Refactor] remove useless inference code (#5022) * remove useless code * fix quant model * fix test import bug * mv original inference legacy * fix chatglm2 * [Refactor] refactor policy search and quant type controlling in inference (#5035) * [Refactor] refactor policy search and quant type controling in inference * [inference] update readme (#5051) * update readme * update readme * fix architecture * fix table * fix table * [inference] udpate example (#5053) * udpate example * fix run.sh * fix rebase bug * fix some errors * update readme * add some features * update interface * update readme * update benchmark * add requirements-infer --------- Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>	2023-11-19 21:05:05 +08:00

1 2

67 Commits (1739df423c79b0c52ff5957b7992c14081d5dd24)