ColossalAI

Commit Graph

Author	SHA1	Message	Date
Frank Lee	a60024e77a	[autoparallel] added utils for broadcast operation (#1665 ) * [autoparallel] added utils for broadcast operation * polish code	2022-09-29 11:22:29 +08:00
YuliangLiu0306	3f068d1409	[autoparallel] update CommSpec (#1667 )	2022-09-29 11:20:59 +08:00
Frank Lee	247a9dbca9	[autoparallel] added bias comm spec to matmul strategy (#1664 )	2022-09-29 11:08:05 +08:00
YuliangLiu0306	746f8f979d	[autoparallel] add batch norm handler v2 (#1666 )	2022-09-29 11:02:49 +08:00
Kirigaya Kazuto	9708638ded	[pipeline/pytree] add pytree to process args and kwargs \| provide `data_process_func` to process args and kwargs after forward (#1642 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward	2022-09-29 10:58:58 +08:00
YuliangLiu0306	c27e701cb2	[autoparallel] remove no strategy nodes (#1652 ) * [autoparallel] remove no strategy nodes * fix none object iteration issue	2022-09-29 10:43:25 +08:00
Frank Lee	50f16a2850	[autoparallel] added compute resharding costs for node handler (#1662 )	2022-09-28 19:55:44 +08:00
Frank Lee	9ec401a722	[autoparallel] added new strategy constructor template (#1661 ) * [autoparallel] added new strategy constructor template * polish code	2022-09-28 14:01:36 +08:00
Frank Lee	3a4d6f63a8	[autoparallel] added node handler for bmm (#1655 )	2022-09-28 11:32:16 +08:00
YuliangLiu0306	095854477f	[autoparallel] add conv handler v2 (#1663 )	2022-09-28 11:24:59 +08:00
YuliangLiu0306	1e7816a460	[autoparallel] adapt solver with gpt (#1653 )	2022-09-28 11:17:26 +08:00
Jiarui Fang	c638bec028	[embedding] polish async copy (#1657 )	2022-09-27 14:37:03 +08:00
Jiarui Fang	988570e4a6	[embedding] add more detail profiling (#1656 )	2022-09-27 13:43:59 +08:00
Jiarui Fang	e1f97fd2b8	[embedding] print profiling results (#1654 )	2022-09-27 12:50:33 +08:00
Frank Lee	30e50c8b4a	[autoparallel] implemented all matmul strategy generator (#1650 )	2022-09-27 12:06:25 +08:00
YuliangLiu0306	03978aad45	[autoparallel] change the following nodes strategies generation logic (#1636 ) * [autoparallel] change the following nodes strategies generation logic * fix unit test	2022-09-27 11:20:52 +08:00
YuliangLiu0306	59f100510a	[autoparallel] where handler (#1651 ) * [autoparallel] where handler * fix unit test	2022-09-27 11:20:43 +08:00
Super Daniel	6135e178b3	[fx] refactor code for profiler / enable fake tensor movement. (#1646 ) * [fx/profiling] provide summary for MetaInfoProp. * [fx/profiler] provide a table of summary. * [fx/profiler] provide a table of summary. * [fx/profiler] provide a table of summary. * [fx/profiler] provide a table of summary. * [fx] optimize table repr. * [fx] optimize table repr. * [fx] refactor code for profiler. * [fx] add docstring. * [fx] add docstring. * [fx] skip test. * [fx] redo. * [fx] redo. * [fx] fix import error for torch11. * [fx] fix import error for torch11.	2022-09-27 10:26:52 +08:00
Boyuan Yao	5d0fdb9cb4	[fx] fix offload codegen test (#1648 ) * [fx] fix offload codegen test * [fx] modify typing	2022-09-27 10:25:27 +08:00
Frank Lee	45b39a692a	[autoparallel] implemented linear projection strategy generator (#1639 )	2022-09-26 16:58:14 +08:00
Frank Lee	154d3ef432	[fix] fixed the collective pattern name for consistency (#1649 ) * [fix] fixed the collective pattern name for consistency * polish code	2022-09-26 16:39:37 +08:00
YuliangLiu0306	b2b2a4af98	[autoparallel] adapt solver with mlp (#1638 )	2022-09-26 15:26:14 +08:00
Jiarui Fang	04443605a5	[embedding] non-blocking cpu-gpu copy (#1647 )	2022-09-26 14:57:57 +08:00
CsRic	0767f67a0f	[embedding] isolate cache_op from forward (#1645 ) Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2022-09-26 11:18:59 +08:00
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2022-09-26 10:06:03 +08:00
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2022-09-24 19:58:18 +08:00
Boyuan Yao	f921733621	[autoparallel] Add pofo sequence annotation (#1637 ) * [autoparallel] annotate pofo sequence * [autoparallel] remove unused print * [autoparallel] fix some code	2022-09-24 01:52:57 +08:00
Super Daniel	04bbabeea8	[fx/profiler] provide a table of summary. (#1634 ) * [fx/profiling] provide summary for MetaInfoProp. * [fx/profiler] provide a table of summary. * [fx] optimize table repr.	2022-09-23 18:12:43 +08:00
HELSON	95c35f73bd	[moe] initialize MoE groups by ProcessGroup (#1640 )	2022-09-23 17:20:41 +08:00
Jiarui Fang	e57df80325	[embeddings] cache option (#1635 )	2022-09-23 16:40:18 +08:00
HELSON	a088022efc	[moe] fix moe bugs (#1633 )	2022-09-23 15:33:57 +08:00
YuliangLiu0306	702dbc5288	[tensor] use communication autograd func (#1617 ) * [tensor] use communication autograd func * change all to all comm spec info * rename pattern and distinguish fwd/bwd * polish code	2022-09-23 13:31:15 +08:00
YuliangLiu0306	c7ac0f4ab2	[autoparallel] add elementwise handler (#1622 ) * [autoparallel] add elementwise handler * polish code * polish code * reduce skipped strategies range * polish code	2022-09-23 13:27:31 +08:00
YuliangLiu0306	3a46215135	[autoparallel] add embedding handler (#1620 )	2022-09-23 12:34:30 +08:00
YuliangLiu0306	69448f64c4	[autoparallel] protect bcast handler from invalid strategies (#1631 )	2022-09-23 12:12:49 +08:00
YuliangLiu0306	0c703189b9	[autoparallel] add layernorm handler (#1629 )	2022-09-23 12:00:25 +08:00
YuliangLiu0306	bf77d3ab65	[autoparallel] recover the merged node strategy index (#1613 )	2022-09-23 11:52:42 +08:00
Boyuan Yao	d6b01feb66	[fx] Modify offload codegen (#1618 ) * [fx] modify offload codegen * [fx] remove repeated hook definitions * [fx] modify offload test	2022-09-23 11:04:52 +08:00
Super Daniel	d967779a32	[fx/profiler] tuned the calculation of memory estimation (#1619 ) * [fx] tuned the meta info and rotor solver. * [fx] remove import. * [fx] remove import. * [fx] remove import. * [fx] tune the meta calculations. * [fx] polish comments. * [fx] remove assertions. * [fx] modify test cases. * [fx] modify test cases. * [fx] optimize import. * [fx	2022-09-23 10:59:47 +08:00
HELSON	f7f2248771	[moe] fix MoE bugs (#1628 ) * remove forced FP32 modules * correct no_shard-contexts' positions	2022-09-22 13:56:30 +08:00
Jiarui Fang	38c68b5b9a	[embedding] rollback for better FAW performance (#1625 )	2022-09-22 11:16:25 +08:00
Frank Lee	d925122020	[autoparallel] added new linear module handler (#1616 )	2022-09-21 12:23:21 +08:00
Kirigaya Kazuto	170fa81095	[pipeline/chimera] test chimera \| fix bug of initializing (#1615 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing	2022-09-20 18:00:39 +08:00
Jiarui Fang	504ff1d101	[embeddings] use cache_ratio instead of cuda_row_num (#1611 )	2022-09-20 14:33:04 +08:00
YuliangLiu0306	6a8f8cc05e	[hotfix] got sliced types (#1614 )	2022-09-20 14:32:42 +08:00
Frank Lee	d397842fa8	[autoparallel] added new node handler (#1612 )	2022-09-20 14:17:21 +08:00
YuliangLiu0306	7d1bb71d5d	[fx] PoC of runtime shape consistency application (#1607 ) * [fx] PoC of runtime shape consistency application * polish code	2022-09-20 14:00:04 +08:00
YuliangLiu0306	47b11c432c	[autoparallel]add bcast matmul strategies (#1605 )	2022-09-20 11:26:21 +08:00
Frank Lee	edb67cb378	[autoparallel] refactored the data structure for sharding strategy (#1610 )	2022-09-20 11:20:54 +08:00
Boyuan Yao	933b6c6367	[fx] Add pofo solver (#1608 ) * [fx] add pofo algorithm * [fx] Add pofo solver * [fx] code refactor * [fx] fix test_linearize import	2022-09-20 11:20:48 +08:00
Kirigaya Kazuto	edc9e419ad	[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera (#1595 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera	2022-09-19 11:44:18 +08:00
ver217	c9e8ce67b8	fix move fp32 shards (#1604 )	2022-09-16 17:33:16 +08:00
YuliangLiu0306	eac1b79371	[autoparallel] add bcast op handler (#1600 ) * [autoparallel] add bcast op handler * polish code * add more BCAST FUNC OP * polish code * add exception handler * polish	2022-09-16 11:33:01 +08:00
Frank Lee	3abf98a633	[autoparallel] added all non-bcast matmul strategies (#1603 )	2022-09-16 10:47:32 +08:00
Frank Lee	db98b695b2	[autoparallel] added strategy generator and bmm strategies (#1602 )	2022-09-15 16:57:07 +08:00
Jiarui Fang	a19eb80998	[embedding] updates some default parameters	2022-09-15 15:45:17 +08:00
Super Daniel	cd5cf2bcc9	[fx/tuning] tune performance on rotor with meta info. (#1599 )	2022-09-15 14:46:36 +08:00
Boyuan Yao	a7cda6f57d	[fx] Add offload codegen (#1598 ) * [fx] add input activation offload to codegen * [fx] modify unit test * [fx] remove two skips in torch11 * [fx] use all_input_nodes instead of _input_nodes	2022-09-14 15:49:06 +08:00
Super Daniel	c8e9b2ad78	[hotfix/rotor] fix variable names (#1597 ) * [fx] add some comment and docstrings. * [fx] add dataflow analysis for an autograd graph. * add intepretation for graph analysis. * [fx] before doing save_tensor_hooks. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] a very accurate version on GPT-2. * [fx] refactor code. * [fx] remove redundant inplace=True. * [fx] refactor code. * [fx] refactor code. * [fx] refactor code. * [fx] dive into backward memory. * [fx] fix variable names in ckpt_solvers and unskip tests. * [fx] commit my changes. * [fx] restore skips. * [fx] restore skips. * [fx] chaange stage into phase. * [fx] chaange stage into phase. * [fx] chaange stage into phase.	2022-09-14 14:27:04 +08:00
YuliangLiu0306	faa23b9d9a	[autoparallel] add reshape handler (#1594 ) * [autoparallel] add reshape handler * polish code	2022-09-14 10:25:45 +08:00
Super Daniel	5c494d4540	[fx] provide an accurate estimation of memory. (#1587 ) * [fx] add some comment and docstrings. * [fx] add dataflow analysis for an autograd graph. * add intepretation for graph analysis. * [fx] before doing save_tensor_hooks. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] a very accurate version on GPT-2. * [fx] refactor code. * [fx] remove redundant inplace=True. * [fx] refactor code. * [fx] refactor code. * [fx] refactor code. * [fx] dive into backward memory.	2022-09-14 09:36:43 +08:00
Frank Lee	27fe8af60c	[autoparallel] refactored shape consistency to remove redundancy (#1591 ) * [autoparallel] refactored shape consistency to remove redundancy * polish code * polish code * polish code	2022-09-13 18:30:18 +08:00
YuliangLiu0306	d164449d00	[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589 )	2022-09-13 18:05:05 +08:00
Frank Lee	7c18a588c8	[autoparallel] added generate_sharding_spec to utils (#1590 )	2022-09-13 15:43:22 +08:00
Boyuan Yao	49ccf8b5f8	[fx] Improve linearize and rotor solver (#1586 ) * [fx] add nested activation_checkpoint codegen * undo algorithms commits * solver * undo some commits * [fx] torch11 add nested activation checkpoint codegen * remove some imports * [fx] add some comments in activation codegen * [fx] codegen instance error fix * [fx] imporve linearize and rotor solver * [fx] some comments and format modification	2022-09-13 14:50:04 +08:00
Frank Lee	219f66c571	[autoparallel] added solver option dataclass (#1588 )	2022-09-13 14:47:09 +08:00
YuliangLiu0306	82d4376c23	[autoparallel] adapt solver with resnet (#1583 ) * [autoparallel]adapt solver with resnet * polish code * polish code	2022-09-13 12:07:09 +08:00
CsRic	f3403ff98e	[embeddings] add already_split_along_rank flag for tablewise mode (#1584 )	2022-09-13 10:50:34 +08:00
Boyuan Yao	f3687e4ee2	[fx] Add nested checkpoint in activation checkpoint codegen (#1585 ) * [fx] add nested activation_checkpoint codegen * undo algorithms commits * solver * undo some commits * [fx] torch11 add nested activation checkpoint codegen * remove some imports * [fx] add some comments in activation codegen * [fx] codegen instance error fix	2022-09-12 20:00:48 +08:00
Boyuan Yao	20e466527b	[NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576 )	2022-09-08 22:11:04 +08:00
Fazzie-Maqianli	06dccdde44	[NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style (#1554 )	2022-09-08 22:11:04 +08:00
CsRic	2ac46f7be4	[NFC] polish utils/tensor_detector/__init__.py code style (#1573 ) Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2022-09-08 22:11:04 +08:00
Sze-qq	2144cbae8c	[NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572 )	2022-09-08 22:11:04 +08:00
superhao1995	e4bf7ae667	[NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571 ) Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>	2022-09-08 22:11:04 +08:00
Jiatong Han	3263cdf57f	[NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570 ) Co-authored-by: JThh <jiatong.han@u.nus.edu>	2022-09-08 22:11:04 +08:00
Zirui Zhu	f566c9b98d	[NFC] polish colossalai/pipeline/utils.py code style (#1562 )	2022-09-08 22:11:04 +08:00
Xue Fuzhao	e070ca45c6	[NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563 )	2022-09-08 22:11:04 +08:00
Zangwei Zheng	9823cbf24b	[NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565 )	2022-09-08 22:11:04 +08:00
DouJS	f586887a90	[NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568 )	2022-09-08 22:11:04 +08:00
LuGY	c7d4932956	[NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566 )	2022-09-08 22:11:04 +08:00
BigOneLiXiaoMing	0c4c9aa6e0	[NFC] polish colossalai/nn/_ops/embedding.py code style (#1561 )	2022-09-08 22:11:04 +08:00
Ziheng Qin	08815f0e72	[NFC] polish colossalai/builder/__init__.py code style (#1560 ) Co-authored-by: henryqin1997 <henryqin1997@gamil.com>	2022-09-08 22:11:04 +08:00
Super Daniel	8328917348	[NFC] polish colossalai/testing/comparison.py code style. (#1558 )	2022-09-08 22:11:04 +08:00
Ofey Chan	7cc052f6c0	[NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556 )	2022-09-08 22:11:04 +08:00
Kai Wang (Victor Kai)	46931e3c32	[NFC] polish code colossalai/gemini/update/search_utils.py (#1557 )	2022-09-08 22:11:04 +08:00
yuxuan-lou	413f9c19f4	[NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555 )	2022-09-08 22:11:04 +08:00
shenggan	8edb777cc2	[NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553 )	2022-09-08 22:11:04 +08:00
Maruyama_Aya	bd2d789832	[NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552 )	2022-09-08 22:11:04 +08:00
binmakeswell	73e9eb13b7	[NFC] polish colossalai/nn/lr_scheduler/cosine.py code style	2022-09-08 22:11:04 +08:00
Kirigaya Kazuto	318fbf1145	[NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559 )	2022-09-08 22:04:34 +08:00
CsRic	a389ac4ec9	[embedding] cache_embedding small improvement (#1564 )	2022-09-08 16:41:19 +08:00
ver217	10dd8226b1	add gather_output for VocabParallelClassifier1D (#1569 )	2022-09-08 16:40:56 +08:00
Kirigaya Kazuto	6159d45417	[pipeline/tuning] improve dispatch performance both time and space cost (#1544 )	2022-09-07 19:01:06 +08:00
Super Daniel	4f59693207	[fx] provide a stable but not accurate enough version of profiler. (#1547 ) * [fx] compute memory stat and flop count for MetaInfoProp. * [fx] modify node attribute. * [fx] modify ckpt_chen. * [fx] fix compatibility. * [fx] fix import error. * [fx] skip test for MetaInfoProp. * [fx] skip test for MetaInfoProp. * [fx] skip test for MetaInfoProp. * [fx] skip test for MetaInfoProp. * [fx] skip if torch 1.11.0. * [fx] recover MetaInfoProp support for PyTorch 1.11. * [fx] provide a stable but not accurate enough version of profiler. * [fx] provide a stable but not accurate enough version of profiler. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix import error.	2022-09-07 11:21:04 +08:00
YuliangLiu0306	0908d0fc61	[autoparallel]add backward cost info into strategies (#1524 )	2022-09-07 11:19:00 +08:00
YuliangLiu0306	1a3599410d	[autoparallel] support fucntion in operator handler (#1529 )	2022-09-07 11:18:41 +08:00
YuliangLiu0306	44c866a3e3	[autoparallel] change the merge node logic (#1533 )	2022-09-07 11:18:19 +08:00
ver217	ae71036cd2	[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548 ) * refactor parallel layer * broadcast rank0 model after load ckpt	2022-09-06 20:18:35 +08:00
ver217	2bed096848	[utils] optimize partition_tensor_parallel_state_dict (#1546 )	2022-09-06 17:45:31 +08:00
Super Daniel	d8a5aded19	[hotfix] change namespace for meta_trace. (#1541 )	2022-09-06 11:46:12 +08:00

1 2 3 4 5 ...

891 Commits (993b8875b6cb5d787b96b99f52cae7d512503afb)