Commit Graph

2509 Commits (74d176c8d84235e1b68f537eb9022c2d0a4e09ca)

Author SHA1 Message Date
ver217 150b1a7453
update local version format (#909) 2022-05-05 14:59:12 +08:00
github-actions[bot] 3b1f5f07ce
Automated submodule synchronization (#907)
Co-authored-by: github-actions <github-actions@github.com>
2022-05-03 13:14:48 +08:00
Ziyue Jiang f593a5637e
[Tensor] add embedding tp1d row (#904) 2022-04-29 14:10:05 +08:00
ver217 16122d5fac
update release bdist CI (#902) 2022-04-28 17:52:57 +08:00
Ziyue Jiang 2c0d19d755
[Tensor] add ColoTensor TP1Dcol Embedding (#899) 2022-04-28 17:45:06 +08:00
ver217 e46e423c00
add CI for releasing bdist wheel (#901) 2022-04-28 17:40:53 +08:00
Jiarui Fang e1108caf7d
change version to 0.1.4 (#900) 2022-04-28 15:51:25 +08:00
Jiarui Fang d16671da75
[Tensor] initialize the ColoOptimizer (#898)
* [Tensor] activation is an attr of ColoTensor

* [Tensor] add optimizer

* only detach parameters in context

* polish code
2022-04-28 15:23:40 +08:00
Jiarui Fang 676f191532
[Tensor] activation is an attr of ColoTensor (#897) 2022-04-28 14:43:22 +08:00
Jiarui Fang e76f76c08b
[Tensor] test parameters() as member function (#896) 2022-04-28 10:57:14 +08:00
Ziyue Jiang cb182da7c5
[tensor] refine linear and add gather for laynorm (#893)
* refine linear and add function to ColoTensor

* add gather for layernorm

* polish

* polish
2022-04-28 10:55:40 +08:00
Jiarui Fang 26c49639d8
[Tensor] overriding paramters() for Module using ColoTensor (#889) 2022-04-27 15:28:59 +08:00
ver217 daf59ff72e
[setup] add local version label (#890) 2022-04-27 15:26:12 +08:00
Ziyue Jiang 1d0aba4153
[tensor] add ColoTensor 1Dcol (#888) 2022-04-27 14:13:55 +08:00
Jiarui Fang a0e5971692
[Tensor] test model check results for a simple net (#887) 2022-04-27 12:00:18 +08:00
Jiarui Fang 72cdc06875
[Tensor] make ColoTensor more robust for getattr (#886)
* [Tensor] make ColoTensor more robust for getattr

* polish

* polish
2022-04-27 10:57:49 +08:00
Ziyue Jiang 9bc5a77c31
[tensor] wrap function in the torch_tensor to ColoTensor (#881) 2022-04-26 20:13:56 +08:00
ver217 4df6471f5d
fix import error (#880) 2022-04-26 19:28:40 +08:00
Jiarui Fang 7f76517a85
[Tensor] make a simple net works with 1D row TP (#879) 2022-04-26 18:11:47 +08:00
ver217 c4d903e64a
[gemini] accelerate adjust_layout() (#878)
* add lru cache

* polish code

* update unit test

* fix sharded optim
2022-04-26 18:08:31 +08:00
Jiarui Fang 909211453b
[Tensor] Add some attributes to ColoTensor (#877)
* [Tensor] add some function to ColoTensor

* torch.allclose

* rm torch.add
2022-04-26 15:10:47 +08:00
HELSON 425b4a96b8
[gemini] polish stateful_tensor_mgr (#876) 2022-04-26 15:05:03 +08:00
Jiarui Fang e43f83aa5c
[Tensor] get named parameters for model using ColoTensors (#874) 2022-04-26 14:08:01 +08:00
LuGY 2883040286
[example] change qkv processing (#870) 2022-04-26 13:33:27 +08:00
Jiarui Fang 96211c2cc8
[tensor] customized op returns ColoTensor (#875)
* [tensor] customized op returns ColoTensor

* polish

* polish code
2022-04-26 13:23:59 +08:00
Ziyue Jiang 26d4ab8b03
[Tensor] Add function to spec and update linear 1Drow and unit tests (#869) 2022-04-26 10:15:26 +08:00
Frank Lee 11f54c7b6b
[doc] improved docstring and assertion messages for the engine module (#871) 2022-04-26 10:00:18 +08:00
Frank Lee 1c34382678
[doc] improved assertion messages in trainer (#873) 2022-04-26 10:00:12 +08:00
Frank Lee 7a64fae33a
[doc] improved error messages in initialize (#872) 2022-04-26 10:00:03 +08:00
Jiarui Fang 1190b2c4a4
[tensor] add cross_entrophy_loss (#868) 2022-04-25 16:01:52 +08:00
HELSON 3107817172
[gemini] add stateful tensor container (#867) 2022-04-25 14:58:16 +08:00
Jiarui Fang d01d3b8cb0
colo init context add device attr. (#866) 2022-04-25 14:24:26 +08:00
Frank Lee 2238758c2e
[usability] improved error messages in the context module (#856) 2022-04-25 13:42:31 +08:00
Frank Lee 9fdebadd69
[doc] improved docstring in the amp module (#857) 2022-04-25 13:42:17 +08:00
Frank Lee b862d89d00
[doc] improved docstring in the logging module (#861) 2022-04-25 13:42:00 +08:00
Frank Lee 8004c8e938
[doc] improved docstring in the communication module (#863) 2022-04-25 13:41:43 +08:00
Jiarui Fang 8af5f7423d
[tensor] an initial dea of tensor spec (#865)
* a initial dea of tensor spec

* polish

* polish
2022-04-25 13:33:52 +08:00
Jiarui Fang 126ba573a8
[Tensor] add layer norm Op (#852) 2022-04-25 11:49:20 +08:00
Frank Lee a82da26f7e
[cli] refactored micro-benchmarking cli and added more metrics (#858) 2022-04-25 11:48:07 +08:00
Frank Lee ee222dfbf3
[usability] added assertion message in registry (#864) 2022-04-25 11:45:15 +08:00
HELSON f0e654558f
[gemini] polish code (#855) 2022-04-25 10:40:14 +08:00
Jiarui Fang 29159d9b5b
hotfix tensor unittest bugs (#862) 2022-04-25 10:06:53 +08:00
Frank Lee 1258af71cc
[ci] cache cuda extension (#860) 2022-04-25 10:03:47 +08:00
YuliangLiu0306 c6930d8ddf
[pipelinable]use ColoTensor to replace dummy tensor. (#853) 2022-04-24 18:31:22 +08:00
Ziyue Jiang bcc8655021
[Tensor ] Add 1Drow weight reshard by spec (#854) 2022-04-24 18:30:20 +08:00
ver217 d7e0303d1e
[zero] use GeminiMemoryManager when sampling model data (#850) 2022-04-24 17:17:22 +08:00
ver217 232142f402
[utils] refactor profiler (#837)
* add model data profiler

* add a subclass of torch.profiler.profile

* refactor folder structure

* remove redundant codes

* polish code

* use GeminiMemoryManager

* fix import path

* fix stm profiler ext

* polish comments

* remove useless file
2022-04-24 17:03:59 +08:00
Jiarui Fang 62f059251b
[Tensor] init a tp network training unittest (#849) 2022-04-24 16:43:44 +08:00
ver217 0dea140760
[hotfix] add deconstructor for stateful tensor (#848)
* add deconstructor for stateful tensor

* fix colo init context
2022-04-24 15:03:04 +08:00
ver217 0f7ed8c192
fix _post_init_method of zero init ctx (#847) 2022-04-24 14:16:50 +08:00