ColossalAI

Commit Graph

Author	SHA1	Message	Date
YuliangLiu0306	59f100510a	[autoparallel] where handler (#1651 ) * [autoparallel] where handler * fix unit test	2 years ago
Super Daniel	6135e178b3	[fx] refactor code for profiler / enable fake tensor movement. (#1646 ) * [fx/profiling] provide summary for MetaInfoProp. * [fx/profiler] provide a table of summary. * [fx/profiler] provide a table of summary. * [fx/profiler] provide a table of summary. * [fx/profiler] provide a table of summary. * [fx] optimize table repr. * [fx] optimize table repr. * [fx] refactor code for profiler. * [fx] add docstring. * [fx] add docstring. * [fx] skip test. * [fx] redo. * [fx] redo. * [fx] fix import error for torch11. * [fx] fix import error for torch11.	2 years ago
Boyuan Yao	5d0fdb9cb4	[fx] fix offload codegen test (#1648 ) * [fx] fix offload codegen test * [fx] modify typing	2 years ago
Frank Lee	45b39a692a	[autoparallel] implemented linear projection strategy generator (#1639 )	2 years ago
Frank Lee	154d3ef432	[fix] fixed the collective pattern name for consistency (#1649 ) * [fix] fixed the collective pattern name for consistency * polish code	2 years ago
YuliangLiu0306	b2b2a4af98	[autoparallel] adapt solver with mlp (#1638 )	2 years ago
Jiarui Fang	04443605a5	[embedding] non-blocking cpu-gpu copy (#1647 )	2 years ago
CsRic	0767f67a0f	[embedding] isolate cache_op from forward (#1645 ) Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>	2 years ago
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2 years ago
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2 years ago
Boyuan Yao	f921733621	[autoparallel] Add pofo sequence annotation (#1637 ) * [autoparallel] annotate pofo sequence * [autoparallel] remove unused print * [autoparallel] fix some code	2 years ago
Super Daniel	04bbabeea8	[fx/profiler] provide a table of summary. (#1634 ) * [fx/profiling] provide summary for MetaInfoProp. * [fx/profiler] provide a table of summary. * [fx] optimize table repr.	2 years ago
HELSON	95c35f73bd	[moe] initialize MoE groups by ProcessGroup (#1640 )	2 years ago
Jiarui Fang	e57df80325	[embeddings] cache option (#1635 )	2 years ago
HELSON	a088022efc	[moe] fix moe bugs (#1633 )	2 years ago
YuliangLiu0306	702dbc5288	[tensor] use communication autograd func (#1617 ) * [tensor] use communication autograd func * change all to all comm spec info * rename pattern and distinguish fwd/bwd * polish code	2 years ago
YuliangLiu0306	c7ac0f4ab2	[autoparallel] add elementwise handler (#1622 ) * [autoparallel] add elementwise handler * polish code * polish code * reduce skipped strategies range * polish code	2 years ago
YuliangLiu0306	3a46215135	[autoparallel] add embedding handler (#1620 )	2 years ago
YuliangLiu0306	69448f64c4	[autoparallel] protect bcast handler from invalid strategies (#1631 )	2 years ago
YuliangLiu0306	0c703189b9	[autoparallel] add layernorm handler (#1629 )	2 years ago
YuliangLiu0306	bf77d3ab65	[autoparallel] recover the merged node strategy index (#1613 )	2 years ago
Boyuan Yao	d6b01feb66	[fx] Modify offload codegen (#1618 ) * [fx] modify offload codegen * [fx] remove repeated hook definitions * [fx] modify offload test	2 years ago
YuliangLiu0306	9eae855408	[hotfix] add recompile after graph manipulatation (#1621 )	2 years ago
Super Daniel	d967779a32	[fx/profiler] tuned the calculation of memory estimation (#1619 ) * [fx] tuned the meta info and rotor solver. * [fx] remove import. * [fx] remove import. * [fx] remove import. * [fx] tune the meta calculations. * [fx] polish comments. * [fx] remove assertions. * [fx] modify test cases. * [fx] modify test cases. * [fx] optimize import. * [fx	2 years ago
HELSON	f7f2248771	[moe] fix MoE bugs (#1628 ) * remove forced FP32 modules * correct no_shard-contexts' positions	2 years ago
Jiarui Fang	38c68b5b9a	[embedding] rollback for better FAW performance (#1625 )	2 years ago
Frank Lee	d925122020	[autoparallel] added new linear module handler (#1616 )	2 years ago
Kirigaya Kazuto	170fa81095	[pipeline/chimera] test chimera \| fix bug of initializing (#1615 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing	2 years ago
Jiarui Fang	504ff1d101	[embeddings] use cache_ratio instead of cuda_row_num (#1611 )	2 years ago
YuliangLiu0306	6a8f8cc05e	[hotfix] got sliced types (#1614 )	2 years ago
Frank Lee	d397842fa8	[autoparallel] added new node handler (#1612 )	2 years ago
YuliangLiu0306	7d1bb71d5d	[fx] PoC of runtime shape consistency application (#1607 ) * [fx] PoC of runtime shape consistency application * polish code	2 years ago
YuliangLiu0306	47b11c432c	[autoparallel]add bcast matmul strategies (#1605 )	2 years ago
Frank Lee	edb67cb378	[autoparallel] refactored the data structure for sharding strategy (#1610 )	2 years ago
Boyuan Yao	933b6c6367	[fx] Add pofo solver (#1608 ) * [fx] add pofo algorithm * [fx] Add pofo solver * [fx] code refactor * [fx] fix test_linearize import	2 years ago
github-actions[bot]	d32cf84c46	Automated submodule synchronization (#1609 ) Co-authored-by: github-actions <github-actions@github.com>	2 years ago
Frank Lee	725666d6a9	[workflow] deactivate conda environment before removing (#1606 )	2 years ago
Kirigaya Kazuto	edc9e419ad	[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera (#1595 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera	2 years ago
ver217	c9e8ce67b8	fix move fp32 shards (#1604 )	2 years ago
YuliangLiu0306	eac1b79371	[autoparallel] add bcast op handler (#1600 ) * [autoparallel] add bcast op handler * polish code * add more BCAST FUNC OP * polish code * add exception handler * polish	2 years ago
Frank Lee	3abf98a633	[autoparallel] added all non-bcast matmul strategies (#1603 )	2 years ago
Frank Lee	db98b695b2	[autoparallel] added strategy generator and bmm strategies (#1602 )	2 years ago
Jiarui Fang	a19eb80998	[embedding] updates some default parameters	2 years ago
Super Daniel	cd5cf2bcc9	[fx/tuning] tune performance on rotor with meta info. (#1599 )	2 years ago
Boyuan Yao	a7cda6f57d	[fx] Add offload codegen (#1598 ) * [fx] add input activation offload to codegen * [fx] modify unit test * [fx] remove two skips in torch11 * [fx] use all_input_nodes instead of _input_nodes	2 years ago
Super Daniel	c8e9b2ad78	[hotfix/rotor] fix variable names (#1597 ) * [fx] add some comment and docstrings. * [fx] add dataflow analysis for an autograd graph. * add intepretation for graph analysis. * [fx] before doing save_tensor_hooks. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] a very accurate version on GPT-2. * [fx] refactor code. * [fx] remove redundant inplace=True. * [fx] refactor code. * [fx] refactor code. * [fx] refactor code. * [fx] dive into backward memory. * [fx] fix variable names in ckpt_solvers and unskip tests. * [fx] commit my changes. * [fx] restore skips. * [fx] restore skips. * [fx] chaange stage into phase. * [fx] chaange stage into phase. * [fx] chaange stage into phase.	2 years ago
YuliangLiu0306	faa23b9d9a	[autoparallel] add reshape handler (#1594 ) * [autoparallel] add reshape handler * polish code	2 years ago
github-actions[bot]	c938dda028	Automated submodule synchronization (#1596 ) Co-authored-by: github-actions <github-actions@github.com>	2 years ago
Super Daniel	5c494d4540	[fx] provide an accurate estimation of memory. (#1587 ) * [fx] add some comment and docstrings. * [fx] add dataflow analysis for an autograd graph. * add intepretation for graph analysis. * [fx] before doing save_tensor_hooks. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] a very accurate version on GPT-2. * [fx] refactor code. * [fx] remove redundant inplace=True. * [fx] refactor code. * [fx] refactor code. * [fx] refactor code. * [fx] dive into backward memory.	2 years ago
Frank Lee	27fe8af60c	[autoparallel] refactored shape consistency to remove redundancy (#1591 ) * [autoparallel] refactored shape consistency to remove redundancy * polish code * polish code * polish code	2 years ago

... 4 5 6 7 8 ...

1373 Commits (d00d905b8601b4e163d81f5d71c4254f462f847e) All Branches Search

1373 Commits (d00d905b8601b4e163d81f5d71c4254f462f847e)

All Branches