Jiarui Fang
d5e3e3ec01
[example] update gpt example for larger model scale ( #2211 )
2022-12-28 13:54:08 +08:00
Jiarui Fang
b87496a66b
[hotfix] fix auto policy of test_sharded_optim_v2 ( #2157 )
2022-12-20 23:03:18 +08:00
Jiarui Fang
ee287620f0
[Gemini] revert ZeROInitCtx related tracer ( #2138 )
2022-12-16 12:37:06 +08:00
Jiarui Fang
c89c66a858
[Gemini] update API of the chunkmemstatscollector. ( #2129 )
2022-12-14 00:47:06 +08:00
Jiarui Fang
2938edf446
[Gemini] update the non model data record method in runtime memory tracer ( #2128 )
2022-12-13 17:11:31 +08:00
Jiarui Fang
9214d1fe28
[Gemini] chunk init using runtime visited param order ( #2115 )
2022-12-12 18:06:16 +08:00
Jiarui Fang
85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) ( #2099 )
2022-12-07 23:04:02 +08:00
Jiarui Fang
33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. ( #2084 )
2022-12-06 16:43:06 +08:00
Jiarui Fang
c4739a725a
[Gemini] polish memstats collector ( #1962 )
2022-11-16 15:45:57 +08:00
Zihao
20e255d4e8
MemStatsCollectorStatic ( #1765 )
2022-11-07 16:49:03 +08:00
HELSON
b28991dd0a
[feature] A new ZeRO implementation ( #1644 )
2022-10-09 09:18:51 +08:00
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2022-09-26 10:06:03 +08:00
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2022-09-24 19:58:18 +08:00
Jiarui Fang
372f791444
[refactor] move chunk and chunkmgr to directory gemini ( #1182 )
2022-06-29 13:31:02 +08:00
ver217
1f894e033f
[gemini] zero supports gemini ( #1093 )
...
* add placement policy
* add gemini mgr
* update mem stats collector
* update zero
* update zero optim
* fix bugs
* zero optim monitor os
* polish unit test
* polish unit test
* add assert
2022-06-10 14:48:28 +08:00
ver217
be01db37c8
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 ( #1077 )
...
* polish chunk manager
* polish unit test
* impl add_extern_static_tensor for chunk mgr
* add mem stats collector v2
* polish code
* polish unit test
* polish code
* polish get chunks
2022-06-09 20:56:34 +08:00
ver217
d7e0303d1e
[zero] use GeminiMemoryManager when sampling model data ( #850 )
2022-04-24 17:17:22 +08:00
Jiarui Fang
4d9332b4c5
[refactor] moving memtracer to gemini ( #801 )
2022-04-19 10:13:08 +08:00