Commit Graph

1554 Commits (f027ef7913bd2f5043937f6715b956a81cb07323)

Author SHA1 Message Date
Boyuan Yao 7c7921f71b
[autoparallel] add torch.nn.ReLU metainfo (#1868)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input
2022-11-16 23:12:31 +08:00
Jiarui Fang 8c66a1d0aa
[polish] remove useless file _mem_tracer_hook.py (#1963) 2022-11-16 15:55:10 +08:00
Jiarui Fang c4739a725a
[Gemini] polish memstats collector (#1962) 2022-11-16 15:45:57 +08:00
YuliangLiu0306 fea3cb661c
[autoparallel] support addmm in tracer and solver (#1961)
* [fx] patch addmm

* [autoparallel] support addmm in tracer and solver
2022-11-16 14:59:18 +08:00
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960) 2022-11-16 14:44:28 +08:00
HELSON 7066dfbf82
[zero] fix memory leak for zero2 (#1955) 2022-11-16 11:43:24 +08:00
Jiarui Fang 60abd86d6a
[example] enhance GPT demo (#1959)
* [example] enhence GPT demo

* Update README.md

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2022-11-16 11:36:27 +08:00
Fazzie-Maqianli acba142929
Merge pull request #1958 from Fazziekey/lightning
update model download in README
2022-11-16 11:29:21 +08:00
Fazzie a09f88ab07 update model download in README 2022-11-16 11:17:30 +08:00
Fazzie-Maqianli 6bdd0a90ca
update lightning version (#1954) 2022-11-15 16:57:48 +08:00
Jiarui Fang 52c6ad26e0
[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) 2022-11-15 16:24:16 +08:00
zbian 598d456d0e fixed logger 2022-11-15 16:00:07 +08:00
zbian 6877121377 updated flash attention api 2022-11-15 15:25:39 +08:00
YuliangLiu0306 36c0f3ea5b
[autoparallel] remove redundancy comm node (#1893) 2022-11-15 10:53:41 +08:00
binmakeswell 9183e0dec5
[tutorial] polish all README (#1946) 2022-11-14 19:49:32 +08:00
Frank Lee de56b563b9
[tutorial] added missing dummy dataloader (#1944) 2022-11-14 04:09:03 -06:00
Frank Lee c6ea65011f
[tutorial] fixed pipeline bug for sequence parallel (#1943) 2022-11-14 04:06:57 -06:00
アマデウス e52f9d9109
[tensorparallel] fixed tp layers (#1938) 2022-11-14 17:34:03 +08:00
Jiarui Fang cf68cc92ac
[example] add vit (#1942)
* [ColoTensor] ColoInitContext initialize parameters in shard mode.

* polish

* [example] add vit
2022-11-14 17:28:03 +08:00
YuliangLiu0306 c7925c5d08
[sc demo] add requirements to spmd README (#1941) 2022-11-14 17:22:45 +08:00
Boyuan Yao d5f5e06d82
[SC] remove redundant hands on (#1939)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information

* [sc] modify auto checkpoint benchmark

* [sc] remove imgs

* [sc] remove redundant handson
2022-11-14 03:05:21 -06:00
Jiarui Fang 9f4fb3f28a
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) 2022-11-14 16:05:09 +08:00
ver217 b42b672842
[release] update version (#1931) 2022-11-13 15:34:08 +08:00
binmakeswell 41868f7605
[tutorial] polish README and OPT files (#1930)
* [tutorial] polish README and OPT files

* [tutorial] polish README and OPT files

* [tutorial] polish README and OPT files
2022-11-13 13:09:58 +08:00
ver217 b0b7a786b7
[tutorial] add synthetic dataset for opt (#1924) 2022-11-13 03:26:11 +08:00
Frank Lee 0486048453
[tutorial] updated hybrid parallel readme (#1928)
* [tutorial] updated hybrid parallel readme

* polish code
2022-11-13 03:25:01 +08:00
Frank Lee 807cbdb87d
[tutorial] added synthetic data for sequence parallel (#1927)
* [tutorial] added synthetic data for sequence parallel

* polish code
2022-11-13 03:24:02 +08:00
Frank Lee abf4c27f6a
[tutorial] removed huggingface model warning (#1925) 2022-11-12 23:12:18 +08:00
Frank Lee d43a671ad6
Hotfix/tutorial readme index (#1922)
* [tutorial] removed tutorial index in readme

* [tutorial] removed tutorial index in readme
2022-11-12 18:24:52 +08:00
Boyuan Yao 24cbee0ebe
[tutorial] modify hands-on of auto activation checkpoint (#1920)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information

* [sc] modify auto checkpoint benchmark

* [sc] remove imgs
2022-11-12 18:21:03 +08:00
Frank Lee ff16773ded
[tutorial] added synthetic data for hybrid parallel (#1921)
* [tutorial] added synthetic data for hybrid parallel

* polish code
2022-11-12 18:18:55 +08:00
Frank Lee 3c42fdbedc
[tutorial] added synthetic data for hybrid parallel (#1919) 2022-11-12 17:49:48 +08:00
Frank Lee 1b0dd05940
[tutorial] added synthetic dataset for auto parallel demo (#1918) 2022-11-12 17:14:32 +08:00
Frank Lee acd9abc5ca
[tutorial] updated auto parallel demo with latest data path (#1917) 2022-11-12 16:55:19 +08:00
Frank Lee d53415bc10
[tutorial] added data script and updated readme (#1916) 2022-11-12 16:38:41 +08:00
binmakeswell 155e202318
[example] update auto_parallel img path (#1910) 2022-11-11 23:43:22 +08:00
Boyuan Yao d5c5bc219e
[SC] add GPT example for auto checkpoint (#1889)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information
2022-11-11 23:17:25 +08:00
binmakeswell 11ee8ae478
[tutorial] add cifar10 for diffusion (#1907) 2022-11-11 19:03:50 +08:00
Junming Wu 14a0b18305
[NFC] polish colossalai/amp/naive_amp/__init__.py code style (#1905) 2022-11-11 17:49:18 +08:00
binmakeswell c13c22c481
[doc] add news (#1901) 2022-11-11 17:26:49 +08:00
Frank Lee cb7ec714c8
[tutorial] removed duplicated tutorials (#1904) 2022-11-11 17:23:40 +08:00
Fazzie-Maqianli 351f0f64e6
[example] add cifar10 dadaset for diffusion (#1902)
* add cifar10 dadasets

* Update README.md

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2022-11-11 17:22:54 +08:00
BoxiangW ca6e75bc28
[tutorial] edited hands-on practices (#1899)
* Add handson to ColossalAI.

* Change names of handsons and edit sequence parallel example.

* Edit wrong folder name

* resolve conflict

* delete readme
2022-11-11 17:08:17 +08:00
BoxiangW d9bf83e084
Add handson to ColossalAI. (#1896)
Co-authored-by: Boxiang Wang <boxiang.wang1@gmail.com>
2022-11-11 16:13:22 +08:00
wozeparrot 0ef8154bfa
Delete .DS_Store (#1894) 2022-11-11 14:24:03 +08:00
github-actions[bot] abadd6e8f7
Automated submodule synchronization (#1797)
Co-authored-by: github-actions <github-actions@github.com>
2022-11-11 09:34:45 +08:00
HELSON 6e51d296f0
[zero] migrate zero1&2 (#1878)
* add zero1&2 optimizer

* rename test ditectory

* rename test files

* change tolerance in test
2022-11-11 09:26:40 +08:00
Super Daniel cc55ff0aa4
[autoparallel] user-friendly API for CheckpointSolver. (#1879)
Merge for SC tutorial
2022-11-10 20:59:28 +08:00
Super Daniel 448248b27c
[fx] metainfo_trace as an API. (#1873)
* [fx] metainfo_trace as an API.

* [fx] add return.
2022-11-10 20:58:37 +08:00
Super Daniel 6d559ea614
[sc] add examples for auto checkpoint. (#1880) 2022-11-10 20:50:15 +08:00