ColossalAI

Commit Graph

Author	SHA1	Message	Date
Kai Wang (Victor Kai)	b38efe4e8a	[NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style (#1718 )	2 years ago
binmakeswell	f6389d0813	[NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715 )	2 years ago
HELSON	f69f9bf223	[zero] add chunk init function for users (#1729 ) * add chunk manager init function * fix unit tests * add comment * add flush=True	2 years ago
Super Daniel	393f594051	[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710 ) * [fx] move meta registration * [fx] fix tests. * [fx] fix test. * [fx] fix. * [meta] refactor meta registration.py. * [fx] add compatibility descriptions. * [fx] polish import. * [fx] add a decorator. * [fx] fix tests. * [fx] remove print. * [fx] edit raise error. * [fx] edit raise error. * [fx] add type hint. * [fx] fix import in experimental. * [rpc] remove color debug. * [meta] fix naming.	2 years ago
Frank Lee	e8d8eda5e7	[autoparallel] moved tests to test_tensor_shard (#1713 )	2 years ago
YuliangLiu0306	845ff4a47a	[autoparallel] resnet block runtime apply (#1709 ) * [autoparallel] resnet block runtime apply * seperate buffer and parameter in MemoryCost * polish code * add comments and todos * fix test issue	2 years ago
Frank Lee	22a115406b	[autoparallel] fixed broken node handler tests (#1708 )	2 years ago
HELSON	1468e4bcfc	[zero] add constant placement policy (#1705 ) * fixes memory leak when paramter is in fp16 in ZeroDDP init. * bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release. * adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.	2 years ago
Frank Lee	6c331a5a09	[autoparallel] refactored the autoparallel module for organization (#1706 ) * [autoparallel] refactored the autoparallel module for organization * polish code	2 years ago
Frank Lee	91cd34e6e0	[unittest] added doc for the pytest wrapper (#1704 )	2 years ago
YuliangLiu0306	451cd72dea	[autoparallel] adapt runtime passes (#1703 ) * [autoparallel] adapt runtime passes v2 * polish code	2 years ago
Jiarui Fang	21962e1593	[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699 )	2 years ago
Frank Lee	0e52f3d3d5	[unittest] supported condititonal testing based on env var (#1701 ) polish code	2 years ago
Frank Lee	8283e95db3	[autoparallel] collated all deprecated files (#1700 ) * [autoparallel] collated all deprecated files * polish code	2 years ago
YuliangLiu0306	81f7530ee7	[autoparallel] adapt solver and CostGraph with new handler (#1695 ) * [autoparallel] adapt solver and CostGraph with new handler * fix test issue	2 years ago
YuliangLiu0306	42b882ef06	[autoparallel] add output handler and placeholder handler (#1694 ) * [autoparallel] add output handler and placeholder handler * Delete test_solver_with_resnet.py * fix test bugs	2 years ago
YuliangLiu0306	56088e6d98	[autoparallel] add pooling handler (#1690 ) * [autoparallel] add pooling handler * polish code	2 years ago
YuliangLiu0306	319d654f79	[autoparallel] where_handler_v2 (#1688 ) * where generator * [autoparallel] where_handler_v2	2 years ago
Boyuan Yao	31d2f03d27	[autoparallel] fix C version rotor inconsistency (#1691 )	2 years ago
Frank Lee	4973157ad7	[autoparallel] added sharding spec conversion for linear handler (#1687 )	2 years ago
YuliangLiu0306	af718e83f2	[autoparallel] add reshape handler v2 and fix some previous bug (#1683 )	2 years ago
Super Daniel	3dd6994427	[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679 ) * [fx/profiler] modify data_ptr into uuid for all tensors. * [fx] modify uuid. * [fx/profiler] tune performance on GPT-2. * [fx] updates. * [fx] debug. * [fx] debug. * [fx] cuda.	2 years ago
YuliangLiu0306	517b63939a	[autoparallel] add unary element wise handler v2 (#1674 )	2 years ago
YuliangLiu0306	f6c6a932b8	[autoparallel] add following node generator (#1673 ) * [autoparallel] add following node generator * polish code * polish code * update name of arguments	2 years ago
YuliangLiu0306	52fda88796	[autoparallel] add layer norm handler v2 (#1671 ) * [autoparallel] add layer norm handler v2 * polish code * polish code	2 years ago
HELSON	b28991dd0a	[feature] A new ZeRO implementation (#1644 )	2 years ago
Boyuan Yao	1df98d5b66	[autoparallel] add rotor C version (#1658 ) * [autoparallel] add rotor c version * [fx] remove metainfoprop in rotor solver * [autoparallel] modify C code format * [autoparallel] remove build.py * [autoparallel] fix C extension build * [autoparallel] add C solver consistency test * [autoparallel] remove some unused imports * [autoparallel] refactor rotor solver code * [autoparallel] replace print with colossalai logger * [autoparallel] ranks fixed	2 years ago
YuliangLiu0306	11ec070e53	[hotfix]unit test (#1670 )	2 years ago
Frank Lee	a60024e77a	[autoparallel] added utils for broadcast operation (#1665 ) * [autoparallel] added utils for broadcast operation * polish code	2 years ago
YuliangLiu0306	3f068d1409	[autoparallel] update CommSpec (#1667 )	2 years ago
YuliangLiu0306	746f8f979d	[autoparallel] add batch norm handler v2 (#1666 )	2 years ago
Kirigaya Kazuto	9708638ded	[pipeline/pytree] add pytree to process args and kwargs \| provide `data_process_func` to process args and kwargs after forward (#1642 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing * [pipeline/pytree] add pytree to process args and kwargs \| provide to process args and kwargs after forward	2 years ago
Frank Lee	3a4d6f63a8	[autoparallel] added node handler for bmm (#1655 )	2 years ago
YuliangLiu0306	095854477f	[autoparallel] add conv handler v2 (#1663 )	2 years ago
YuliangLiu0306	1e7816a460	[autoparallel] adapt solver with gpt (#1653 )	2 years ago
Frank Lee	30e50c8b4a	[autoparallel] implemented all matmul strategy generator (#1650 )	2 years ago
YuliangLiu0306	03978aad45	[autoparallel] change the following nodes strategies generation logic (#1636 ) * [autoparallel] change the following nodes strategies generation logic * fix unit test	2 years ago
YuliangLiu0306	59f100510a	[autoparallel] where handler (#1651 ) * [autoparallel] where handler * fix unit test	2 years ago
Boyuan Yao	5d0fdb9cb4	[fx] fix offload codegen test (#1648 ) * [fx] fix offload codegen test * [fx] modify typing	2 years ago
Frank Lee	45b39a692a	[autoparallel] implemented linear projection strategy generator (#1639 )	2 years ago
Frank Lee	154d3ef432	[fix] fixed the collective pattern name for consistency (#1649 ) * [fix] fixed the collective pattern name for consistency * polish code	2 years ago
YuliangLiu0306	b2b2a4af98	[autoparallel] adapt solver with mlp (#1638 )	2 years ago
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2 years ago
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2 years ago
HELSON	95c35f73bd	[moe] initialize MoE groups by ProcessGroup (#1640 )	2 years ago
HELSON	a088022efc	[moe] fix moe bugs (#1633 )	2 years ago
YuliangLiu0306	702dbc5288	[tensor] use communication autograd func (#1617 ) * [tensor] use communication autograd func * change all to all comm spec info * rename pattern and distinguish fwd/bwd * polish code	2 years ago
YuliangLiu0306	0c703189b9	[autoparallel] add layernorm handler (#1629 )	2 years ago
YuliangLiu0306	bf77d3ab65	[autoparallel] recover the merged node strategy index (#1613 )	2 years ago
Boyuan Yao	d6b01feb66	[fx] Modify offload codegen (#1618 ) * [fx] modify offload codegen * [fx] remove repeated hook definitions * [fx] modify offload test	2 years ago
YuliangLiu0306	9eae855408	[hotfix] add recompile after graph manipulatation (#1621 )	2 years ago
Super Daniel	d967779a32	[fx/profiler] tuned the calculation of memory estimation (#1619 ) * [fx] tuned the meta info and rotor solver. * [fx] remove import. * [fx] remove import. * [fx] remove import. * [fx] tune the meta calculations. * [fx] polish comments. * [fx] remove assertions. * [fx] modify test cases. * [fx] modify test cases. * [fx] optimize import. * [fx	2 years ago
HELSON	f7f2248771	[moe] fix MoE bugs (#1628 ) * remove forced FP32 modules * correct no_shard-contexts' positions	2 years ago
Jiarui Fang	38c68b5b9a	[embedding] rollback for better FAW performance (#1625 )	2 years ago
Frank Lee	d925122020	[autoparallel] added new linear module handler (#1616 )	2 years ago
Kirigaya Kazuto	170fa81095	[pipeline/chimera] test chimera \| fix bug of initializing (#1615 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera * [pipeline/chimera] test chimera \| fix bug of initializing	2 years ago
Jiarui Fang	504ff1d101	[embeddings] use cache_ratio instead of cuda_row_num (#1611 )	2 years ago
YuliangLiu0306	7d1bb71d5d	[fx] PoC of runtime shape consistency application (#1607 ) * [fx] PoC of runtime shape consistency application * polish code	2 years ago
YuliangLiu0306	47b11c432c	[autoparallel]add bcast matmul strategies (#1605 )	2 years ago
Boyuan Yao	933b6c6367	[fx] Add pofo solver (#1608 ) * [fx] add pofo algorithm * [fx] Add pofo solver * [fx] code refactor * [fx] fix test_linearize import	2 years ago
Kirigaya Kazuto	edc9e419ad	[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera (#1595 ) * [pipeline/tuning] improve dispatch performance both time and space cost * [pipeline/converge] add interface for testing convergence * [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style * Update PipelineBase.py * [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule \| finish Chimera	2 years ago
YuliangLiu0306	eac1b79371	[autoparallel] add bcast op handler (#1600 ) * [autoparallel] add bcast op handler * polish code * add more BCAST FUNC OP * polish code * add exception handler * polish	2 years ago
Boyuan Yao	a7cda6f57d	[fx] Add offload codegen (#1598 ) * [fx] add input activation offload to codegen * [fx] modify unit test * [fx] remove two skips in torch11 * [fx] use all_input_nodes instead of _input_nodes	2 years ago
Super Daniel	c8e9b2ad78	[hotfix/rotor] fix variable names (#1597 ) * [fx] add some comment and docstrings. * [fx] add dataflow analysis for an autograd graph. * add intepretation for graph analysis. * [fx] before doing save_tensor_hooks. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] provide an accurate estimation of memory except for GPT-2. * [fx] a very accurate version on GPT-2. * [fx] refactor code. * [fx] remove redundant inplace=True. * [fx] refactor code. * [fx] refactor code. * [fx] refactor code. * [fx] dive into backward memory. * [fx] fix variable names in ckpt_solvers and unskip tests. * [fx] commit my changes. * [fx] restore skips. * [fx] restore skips. * [fx] chaange stage into phase. * [fx] chaange stage into phase. * [fx] chaange stage into phase.	2 years ago
YuliangLiu0306	faa23b9d9a	[autoparallel] add reshape handler (#1594 ) * [autoparallel] add reshape handler * polish code	2 years ago
Frank Lee	27fe8af60c	[autoparallel] refactored shape consistency to remove redundancy (#1591 ) * [autoparallel] refactored shape consistency to remove redundancy * polish code * polish code * polish code	2 years ago
YuliangLiu0306	d164449d00	[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589 )	2 years ago
Frank Lee	219f66c571	[autoparallel] added solver option dataclass (#1588 )	2 years ago
YuliangLiu0306	82d4376c23	[autoparallel] adapt solver with resnet (#1583 ) * [autoparallel]adapt solver with resnet * polish code * polish code	2 years ago
CsRic	f3403ff98e	[embeddings] add already_split_along_rank flag for tablewise mode (#1584 )	2 years ago
Boyuan Yao	f3687e4ee2	[fx] Add nested checkpoint in activation checkpoint codegen (#1585 ) * [fx] add nested activation_checkpoint codegen * undo algorithms commits * solver * undo some commits * [fx] torch11 add nested activation checkpoint codegen * remove some imports * [fx] add some comments in activation codegen * [fx] codegen instance error fix	2 years ago
アマデウス	e615cfc3a8	[NFC] polish test component gpt code style (#1567 )	2 years ago
Kirigaya Kazuto	6159d45417	[pipeline/tuning] improve dispatch performance both time and space cost (#1544 )	2 years ago
Super Daniel	4f59693207	[fx] provide a stable but not accurate enough version of profiler. (#1547 ) * [fx] compute memory stat and flop count for MetaInfoProp. * [fx] modify node attribute. * [fx] modify ckpt_chen. * [fx] fix compatibility. * [fx] fix import error. * [fx] skip test for MetaInfoProp. * [fx] skip test for MetaInfoProp. * [fx] skip test for MetaInfoProp. * [fx] skip test for MetaInfoProp. * [fx] skip if torch 1.11.0. * [fx] recover MetaInfoProp support for PyTorch 1.11. * [fx] provide a stable but not accurate enough version of profiler. * [fx] provide a stable but not accurate enough version of profiler. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix compatibility in tests. * [fx] fix import error.	2 years ago
YuliangLiu0306	0908d0fc61	[autoparallel]add backward cost info into strategies (#1524 )	2 years ago
YuliangLiu0306	44c866a3e3	[autoparallel] change the merge node logic (#1533 )	2 years ago
Jiarui Fang	64169f3e8f	[embedding] polish parallel embedding tablewise (#1545 )	2 years ago
CsRic	964123ae0f	[embedding] freq_aware_embedding: add small functions for caller application (#1537 )	2 years ago
Boyuan Yao	56159049e8	[fx] Modify solver linearize and add corresponding test (#1531 ) * [fx] modify solver linearize and add test * [fx] add torch11 test of linearize but skip it * [fx] remove some unused imports	2 years ago
Super Daniel	7dc53237c3	[fx] add test for meta tensor. (#1527 ) * [fx] add test for meta tensor. * [fx] add test for meta tensor. * [fx] add test for meta tensor. * [fx] add test for meta tensor. * [fx] fix error.	2 years ago
YuliangLiu0306	4b3d6caeb3	[fx]patch nn.functional convolution (#1528 )	2 years ago
CsRic	5156d5b4f8	[embedding] add tablewise sharding for FAW (#1526 )	2 years ago
Kirigaya Kazuto	f1e1836218	[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508 ) * support p2p communication with any type of object \| pass test * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) \| pass test * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule * [pipeline/rpc] implement a demo for PP with cuda rpc framework * [pipeline/rpc] support interleaving \| fix checkpoint bug \| change logic when dispatch data in work_list to ensure steady 1F1B * [pipeline/rpc] implement distributed optimizer \| test with assert_close * [pipeline/rpc] implement distributed optimizer \| test with assert_close * [pipeline/rpc] update outstanding mechanism \| optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism \| optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism \| optimize dispatching strategy * [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP * [pipeline/pipleline_process_group] remove comment * [pipeline/pipleline_process_group] remove comment * [pipeline/pipleline_process_group] skip process group test * [pipeline/pipleline_process_group] remove test named function	2 years ago
Boyuan Yao	b231430bcb	[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521 ) * [fx] fix wrong variable name in solver rotor * [fx] fix wrong variable name in solver rotor * [fx] fix the discretize bug * [fx] fix the first op in activation checkpoint codegen * [fx] fix some bugs of ckpt solver * [fx] modify test_ckpt_torchvision * [fx] set sequence to __sequence__ attr of GraphModule * [fx] docstring modification * [fx] remove performance test	2 years ago
YuliangLiu0306	3345c6d352	[autoparellel]add strategies constructor (#1505 ) * [autoparellel]add strategies constructor * remove duplicated strategies * polish code * adapt cost graph with StrategiesConstructor * polish	2 years ago
Frank Lee	a0436a62ee	[autoparallel] added liveness analysis (#1516 ) * [autoparallel] added liveness analysis * remove memory cost	2 years ago
Jiarui Fang	9a9ef65313	[FAW] cpu caching operations (#1520 )	2 years ago
Jiarui Fang	af5438caa2	[FAW] refactor reorder() for CachedParamMgr (#1514 )	2 years ago
CsRic	1b8fee8e9c	[FAW] shrink freq_cnter size (#1509 )	2 years ago
Boyuan Yao	4acc58ee20	[fx] Fix activation codegen dealing with checkpointing first op (#1510 )	2 years ago
Kirigaya Kazuto	5a6fd71f90	[pipeline/rpc] update outstanding mechanism \| optimize dispatching strategy (#1497 ) * support p2p communication with any type of object \| pass test * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) \| pass test * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule * [pipeline/rpc] implement a demo for PP with cuda rpc framework * [pipeline/rpc] support interleaving \| fix checkpoint bug \| change logic when dispatch data in work_list to ensure steady 1F1B * [pipeline/rpc] implement distributed optimizer \| test with assert_close * [pipeline/rpc] implement distributed optimizer \| test with assert_close * [pipeline/rpc] update outstanding mechanism \| optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism \| optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism \| optimize dispatching strategy	2 years ago
CsRic	0ed2f46131	[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494 )	2 years ago
YuliangLiu0306	8b7d6bd5be	[autoparallel] add more sharding strategies to conv (#1487 )	2 years ago
Boyuan Yao	de1e716dc4	[fx] Add activation checkpoint solver rotor (#1496 ) * [fx] fix defining ckpt functions inside forward * [fx] Modify activation checkpoint codegen and add ColoGraphModule * [fx] some modification * some modifications * some modifications * some modifications * some modifications * some code modifications * [automatic_parallel] ckpt solver rotor * [fx] add ckpt_solver_rotor * [fx] modification * code refactor * code refactor	2 years ago
YuliangLiu0306	413c053453	[autoparallel] add cost graph class (#1481 ) * [autoparallel] add cost graph class * polish code	2 years ago
YuliangLiu0306	4b03c25f85	[tensor]add 1D device mesh (#1492 )	2 years ago
CsRic	b8d0e39eaf	[FAW] LFU cache for the FAW	2 years ago
Kirigaya Kazuto	9145aef2b4	[pipeline/rpc] implement distributed optimizer \| test with assert_close (#1486 ) * support p2p communication with any type of object \| pass test * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) \| pass test * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule * [pipeline/rpc] implement a demo for PP with cuda rpc framework * [pipeline/rpc] support interleaving \| fix checkpoint bug \| change logic when dispatch data in work_list to ensure steady 1F1B * [pipeline/rpc] implement distributed optimizer \| test with assert_close * [pipeline/rpc] implement distributed optimizer \| test with assert_close	2 years ago
Frank Lee	3da68d6b1b	[fx] fixed adapative pooling size concatenation error (#1489 )	2 years ago
Jiarui Fang	cde7b8a5b8	[FAW] init an LFU implementation for FAW (#1488 )	2 years ago

1 2 3 4 5 ...

534 Commits (11ee8ae478cb2d6e4adcb9668b2abe0d3eba7aca)