* [release] update version
* [devops] update compatibility test
* [devops] update compatibility test
* [devops] update compatibility test
* [devops] update compatibility test
* [test] fix ddp plugin test
* [test] fix gptj and rpc test
* [devops] fix cuda ext compatibility
* [inference] fix flash decoding test
* [inference] fix flash decoding test
* clean requirements
* modify example inference struct
* add test ci scripts
* mark test_infer as submodule
* rm deprecated cls & deps
* import of HAS_FLASH_ATTN
* prune inference tests to be run
* prune triton kernel tests
* increment pytest timeout mins
* revert import path in openmoe
* Adapted to the baichuan2-7B model
* modified according to the review comments.
* Modified the method of obtaining random weights.
* modified according to the review comments.
* change mlp layewr 'NOTE'