* [pre-commit.ci] auto fixes from pre-commit.com hooks
* add parallel cross entropy output for falcon model & fix some typos in bloom.py
* fix module name error, self.model -> self.transformers in bloom, falcon model
* Fix the overflow bug of distributed cross entropy loss function when training with fp16
* add dtype to parallel cross entropy loss function
* fix dtype related typos adn prettify the loss.py
* fix grad dtype and update dtype mismatch error
* fix typo bugs