* support sequence_parallel for no pipeline
* sequence_parallel does not support no-flash-attn
* support sequence parallel for pipeline
* add memory profiler
* Update 13B.py
* add memory profiler
* fix evaluation bug
* remove some unnecessary code
* remove some unnecessary code
* Update parallel_context.py
* modify the config
* remove memory profiler
* modify the config
* support selective dropout
* feat(utils/writer.py): support tensorboard writer
* feat(utils/writer.py): add class comment
* feat(core): support pipeline parallel
* fix(core): fix demo running error
* feat(solver/optimizer): add pp zero optimizer
* fix(solver/optimizer): fix word spelling error
* feat(core/scheduler): add new dir scheduler in core/
* fix(core): fix ci lint error
* feat(solver/optimizer): merge pp and nopp optimizer
* doc(usage.md): update usage doc
* feat(core/scheduler): support post func
* feat(core/scheduler): add dtype para in pp sche and update func get_tensor_shape
* feat(core/scheduler): add _load_micro_batch in base scheduler
* feat(core/scheduler): support optimizer overlap communication in pp scheduler
* feat(core/scheduler): delete data process func code
* feat(core/trainer): schedule pre processing for all schedule
---------
Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>