Frank Lee
7cfed5f076
[feat] refactored extension module ( #5298 )
...
* [feat] refactored extension module
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
2024-01-25 17:01:48 +08:00
ver217
148469348a
Merge branch 'main' into sync/npu
2024-01-18 12:05:21 +08:00
Wenhao Chen
ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg ( #5268 )
...
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
2024-01-15 15:57:40 +08:00
Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ( #5239 )
...
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
Elsa Granger
d565df3821
[pipeline] A more general _communicate in p2p ( #5062 )
...
* A more general _communicate
* feat: finish tree_flatten version p2p
* fix: update p2p api calls
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-01-08 15:37:27 +08:00
Wenhao Chen
d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock ( #5214 )
...
* fix: add fallback order option and update 1f1b
* fix: fix deadlock comm in interleaved pp
* test: modify p2p test
2024-01-03 11:34:49 +08:00
Wenhao Chen
3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode ( #5201 )
...
* fix: remove drop last in val & test dataloader
* feat: add run_forward_only, support arbitrary bs
* chore: modify ci script
2024-01-02 23:41:12 +08:00
Wenhao Chen
4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp ( #5134 )
...
* test: add more p2p tests
* fix: remove send_forward_recv_forward as p2p op list need to use the same group
* fix: make send and receive atomic
* feat: update P2PComm fn
* feat: add metadata cache in 1f1b
* feat: add metadata cache in interleaved pp
* feat: modify is_xx_stage fn
* revert: add _broadcast_object_list
* feat: add interleaved pp in llama policy
* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
2023-12-22 10:44:00 +08:00
Hongxin Liu
e5ce4c8ea6
[npu] add npu support for gemini and zero ( #5067 )
...
* [npu] setup device utils (#5047 )
* [npu] add npu device support
* [npu] support low level zero
* [test] update npu zero plugin test
* [hotfix] fix import
* [test] recover tests
* [npu] gemini support npu (#5052 )
* [npu] refactor device utils
* [gemini] support npu
* [example] llama2+gemini support npu
* [kernel] add arm cpu adam kernel (#5065 )
* [kernel] add arm cpu adam
* [optim] update adam optimizer
* [kernel] arm cpu adam remove bf16 support
2023-11-20 16:12:41 +08:00
Elsa Granger
b2ad0d9e8f
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark ( #5017 )
...
* Use p2p
* Cannot bidirectonal send p2p
* Refactor tensor creation and serialization in P2P
communication
* Fix llama forward args in flash attention
* Add flop estimate from megatron
* Support loading weight not in weight_map when strict=False in hybrid_parallel
* Use send_forward_recv_backward, etc in 1f1b
* Use dataclass for metdata
Remove torch.cuda.synchronize() as suggested
* Add comment about the torch.cuda.synchronize for potential error
* Typo
* Update hybrid_parallel_checkpoint_io.py
* Update p2p.py
* Update one_f_one_b.py
* Update p2p.py
---------
Co-authored-by: flybird11111 <1829166702@qq.com>
2023-11-16 20:15:59 +08:00
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Baizhou Zhang
660eed9124
[pipeline] set optimizer to optional in execute_pipeline ( #4630 )
...
* set optimizer to optional in execute_pipeline
* arrange device and mixed precision in booster init
* fix execute_pipeline in booster.py
2023-09-07 10:42:59 +08:00
flybird11111
0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin ( #4584 )
...
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* [shardformer] fix opt test hanging
* fix
* test
* test
* [shardformer] zero1+pp and the corresponding tests (#4517 )
* pause
* finish pp+zero1
* Update test_shard_vit.py
* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516 )
* fix overlap bug and support bert, add overlap as an option in shardconfig
* support overlap for chatglm and bloom
* [shardformer] fix emerged bugs after updating transformers (#4526 )
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] Add overlap support for gpt2 (#4535 )
* add overlap support for gpt2
* remove unused code
* remove unused code
* [shardformer] support pp+tp+zero1 tests (#4531 )
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] fix submodule replacement bug when enabling pp (#4544 )
* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540 )
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* rebase feature/shardformer
* update pipeline
* [shardformer] fix
* [shardformer] fix
* [shardformer] bert finetune fix
* [shardformer] add all_reduce operation to loss
add all_reduce operation to loss
* [shardformer] make compatible with pytree.
make compatible with pytree.
* [shardformer] disable tp
disable tp
* [shardformer] add 3d plugin to ci test
* [shardformer] update num_microbatches to None
* [shardformer] update microbatchsize
* [shardformer] update assert
* update scheduler
* update scheduler
---------
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
2023-09-04 21:46:29 +08:00
Jianghai
24c0768795
[shardformer] Pytree fix ( #4533 )
...
* pytree test
* test bert
* test bert
* test bert
* revise
* add register
* add register
2023-09-04 17:52:23 +08:00
Hongxin Liu
508ca36fe3
[pipeline] 1f1b schedule receive microbatch size ( #4589 )
2023-09-01 21:45:14 +08:00
Jianghai
376533a564
[shardformer] zero1+pp and the corresponding tests ( #4517 )
...
* pause
* finish pp+zero1
* Update test_shard_vit.py
2023-08-28 10:51:16 +08:00
LuGY
a78daf6180
[shardformer] support interleaved pipeline ( #4448 )
...
* support interleaved pipeline
* fix unit test
* remove virtual stage test in stage mgr
* add droped type hint and updated bwd
2023-08-16 19:29:03 +08:00
Baizhou Zhang
ed4c448488
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline ( #4388 )
...
* fix remaining t5 bugs/rewrite t5 tests
* fix multi-tensor communication in pipeline
* rearrange test_config
* fix keyerror in sync_shared_params
* fix get_held_layers & Randomnizer, complete t5 tests
* erase printing
* fix get_held_layers through modifying _release_unheld_layers
* fix _get_recursive_held_layers bug
2023-08-15 23:25:14 +08:00
Jianghai
f13954cd58
[pipeline] refactor test pipeline and remove useless utils in pipeline ( #4324 )
...
* refactor tests
* refactor bloom model
* finish policy tests
* refactor tests
* fix test pure pipeline
* remove test pipeline and cutdown launch process
* refactor tests
* refactor bloom model
* finish policy tests
* refactor tests
* fix test pure pipeline
* remove test pipeline and cutdown launch process
2023-08-15 23:25:14 +08:00
Jianghai
e7cc62d735
[pipeline] All bert models ( #4233 )
...
* bloom policy
* llama pipeline forward and tests
* fix the output and attention_mask
* fix name
* bind argument to policy
* Revert "bloom policy"
This reverts commit 8dee68a0a2
.
This policy should be revert and copied to feature/bloom
* revert the bloom changes
* cancel unneeded inputs
* gpt
* finish llama
* causal lm and sequence classification
* revision
* add pure pipeline test
* finish some bert models
* finish all bert models
* finish bert tests
* fix bugs
* fix bugs
* fix test pipeline
* fix data gen for qa
* update the set pipeline forward
* shared params
* fix bugs
2023-08-15 23:25:14 +08:00
Hongxin Liu
f51ce1bc8e
[pipeline] refactor 1f1b schedule ( #4115 )
...
* [api] update optimizer wrapper to fit pipeline
* [pipeline] add base schedule
* [pipeline] add 1f1b schedule
* [test] add pipeline schedule utils test
* [pipeline] fix import
2023-08-15 23:25:14 +08:00