* [npu] setup device utils (#5047)
* [npu] add npu device support
* [npu] support low level zero
* [test] update npu zero plugin test
* [hotfix] fix import
* [test] recover tests
* [npu] gemini support npu (#5052)
* [npu] refactor device utils
* [gemini] support npu
* [example] llama2+gemini support npu
* [kernel] add arm cpu adam kernel (#5065)
* [kernel] add arm cpu adam
* [optim] update adam optimizer
* [kernel] arm cpu adam remove bf16 support
* add test
* fix no_sync bug in low level zero plugin
* fix test
* add argument for grad accum
* add grad accum in backward hook for gemini
* finish implementation, rewrite tests
* fix test
* skip stuck model in low level zero test
* update doc
* optimize communication & fix gradient checkpoint
* modify doc
* cleaning codes
* update cpu adam fp16 case
* [feature] support no master weights for low level zero plugin
* [feature] support no master weights for low level zero plugin, remove data copy when no master weights
* remove data copy and typecasting when no master weights
* not load weights to cpu when using no master weights
* fix grad: use fp16 grad when no master weights
* only do not update working param when no master weights
* fix: only do not update working param when no master weights
* fix: passing params in dict format in hybrid plugin
* fix: remove extra params (tp_process_group) in hybrid_parallel_plugin
* [bf16] add bf16 support for fused adam (#3844)
* [bf16] fused adam kernel support bf16
* [test] update fused adam kernel test
* [test] update fused adam test
* [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860)
* [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869)
* [bf16] add mixed precision mixin
* [bf16] low level zero optim support bf16
* [text] update low level zero test
* [text] fix low level zero grad acc test
* [bf16] add bf16 support for gemini (#3872)
* [bf16] gemini support bf16
* [test] update gemini bf16 test
* [doc] update gemini docstring
* [bf16] add bf16 support for plugins (#3877)
* [bf16] add bf16 support for legacy zero (#3879)
* [zero] init context support bf16
* [zero] legacy zero support bf16
* [test] add zero bf16 test
* [doc] add bf16 related docstring for legacy zero
* [booster] fix no_sync method
* [booster] add test for ddp no_sync
* [booster] fix merge
* [booster] update unit test
* [booster] update unit test
* [booster] update unit test