Hongxin Liu
|
d202cc28c0
|
[npu] change device to accelerator api (#5239)
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
|
2024-01-09 10:20:05 +08:00 |
Xuanlei Zhao
|
dd2c28a323
|
[npu] use extension for op builder (#5172)
* update extension
* update cpu adam
* update is
* add doc for cpu adam
* update kernel
* update commit
* update flash
* update memory efficient
* update flash attn
* update flash attention loader
* update api
* fix
* update doc
* update example time limit
* reverse change
* fix doc
* remove useless kernel
* fix
* not use warning
* update
* update
|
2024-01-08 11:39:16 +08:00 |
Xuanlei Zhao
|
d6df19bae7
|
[npu] support triangle attention for llama (#5130)
* update fused attn
* update spda
* tri attn
* update triangle
* import
* fix
* fix
|
2023-11-30 14:21:30 +08:00 |
Xuanlei Zhao
|
3acbf6d496
|
[npu] add npu support for hybrid plugin and llama (#5090)
* llama 3d
* update
* fix autocast
|
2023-11-22 19:23:21 +08:00 |
littsk
|
1a3315e336
|
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
* [hotfix] Add layer norm gradients all-reduce for sequence parallel. (#4915)
* Add layer norm gradients all-reduce for sequence parallel.
* skip pipeline inference test
* [hotfix] fixing polices of sequence parallel (#4922)
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
---------
Co-authored-by: littsk <1214689160@qq.com>
* Hotfix/add grad all reduce for sequence parallel (#4927)
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
* fix bug using wrong variables
---------
Co-authored-by: littsk <1214689160@qq.com>
* fix policy initialization
* fix bloom and chatglm policices
* polish code of handling layernorm
* fix moe module
* polish code of class initializing
---------
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
|
2023-11-03 13:32:43 +08:00 |
Hongxin Liu
|
079bf3cb26
|
[misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
|
2023-09-19 14:20:26 +08:00 |
Hongxin Liu
|
172f7fa3cf
|
[misc] resolve code factor issues (#4433)
|
2023-08-15 23:25:14 +08:00 |
Baizhou Zhang
|
ed4c448488
|
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388)
* fix remaining t5 bugs/rewrite t5 tests
* fix multi-tensor communication in pipeline
* rearrange test_config
* fix keyerror in sync_shared_params
* fix get_held_layers & Randomnizer, complete t5 tests
* erase printing
* fix get_held_layers through modifying _release_unheld_layers
* fix _get_recursive_held_layers bug
|
2023-08-15 23:25:14 +08:00 |
Frank Lee
|
b1c2901530
|
[shardformer] supported bloom model (#4098)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
015af592f8
|
[shardformer] integrated linear 1D with dtensor (#3996)
* [shardformer] integrated linear 1D with dtensor
* polish code
|
2023-07-04 16:05:01 +08:00 |