Frank Lee
8823cc4831
Merge pull request #5310 from hpcaitech/feature/npu
...
Feature/npu
2024-01-29 13:49:39 +08:00
Frank Lee
7cfed5f076
[feat] refactored extension module ( #5298 )
...
* [feat] refactored extension module
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
2024-01-25 17:01:48 +08:00
digger yu
bce9499ed3
fix some typo ( #5307 )
2024-01-25 13:56:27 +08:00
Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ( #5239 )
...
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
Wenhao Chen
3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test ( #5032 )
...
* feat: modify create_ep_hierarchical_group args
* test: add ep tests
* fix: remove get_process_group_ranks
* fix: fix src_rank
2023-11-17 10:53:00 +08:00
Wenhao Chen
724441279b
[moe]: fix ep/tp tests, add hierarchical all2all ( #4982 )
...
* fix: add warning for EP different behavior
* fix: use shard_data in ep & tp model
* to: add used_capacity
* fix: fix router test
* feat: add create_ep_node_group
* feat: add create_ep_hierarchical_group fn
* feat: add HierarchicalAllToAll
* test: add hierarchical all2all test
* fix: fix test errors
* fix: simplify create_ep_hierarchical_group
* fix: add hierarchical_alltoall arg
* fix: fix environ typo
* revert: revert process mesh order
* to: add todo mark
* fix: skip hierarchical_comm if torch < 1.13.1
2023-11-09 06:31:00 +00:00
Xuanlei Zhao
f71e63b0f3
[moe] support optimizer checkpoint ( #5015 )
...
* Refactor MoE Manager setup method
* unshard optim ckpt
* optim io
* update transformer version
* update requirements
* update ckpt
* update ckpt
* update ckpt
* fix engine
* fix engine
2023-11-08 15:07:03 +00:00
Xuanlei Zhao
dc003c304c
[moe] merge moe into main ( #4978 )
...
* update moe module
* support openmoe
2023-11-02 02:21:24 +00:00