* t5 token, still pytest fail
* Resolve T5 Pytest Failure
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typos
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* update to fully overlap, still debugging
* improve interface
* fixed deadlock bug
* debug NaN loss
* (experimental) use one comm group for send_fw_recv_fw to fix NaN
* cleaned up interfaces; use one batch p2p for all
* clean up; removed the double p2p batch case
* p2p test passsed
* improve overlap: send fwd before backward
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* tentatively use 2 p2p batches
* remove two p2p batches
* fix typos
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove pp.sh
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: root <root@notebook-c55824c0-7742-45e8-9591-c855bb77ad29-0.notebook-c55824c0-7742-45e8-9591-c855bb77ad29.colossal-ai.svc.cluster.local>
* [gemini] async grad chunk reduce (all-reduce&reduce-scatter)
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [gemini] add test
* [gemini] rename func
* [gemini] update llama benchmark
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [gemini] use tensor counter
* [gemini] change default config in GeminiPlugin and GeminiDDP
* [chore] typo
* [gemini] fix sync issue & add test cases
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [bug] fix silly bug
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [chore] add test for prefetch
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
* add parallel cross entropy output for falcon model & fix some typos in bloom.py
* fix module name error, self.model -> self.transformers in bloom, falcon model
* Fix the overflow bug of distributed cross entropy loss function when training with fp16
* add dtype to parallel cross entropy loss function
* fix dtype related typos adn prettify the loss.py
* fix grad dtype and update dtype mismatch error
* fix typo bugs