Frank Lee
1beb85cc25
[checkpoint] refactored the API and added safetensors support ( #3427 )
...
* [checkpoint] refactored the API and added safetensors support
* polish code
2 years ago
アマデウス
e78a1e949a
fix torch 2.0 compatibility ( #3346 )
2 years ago
CsRic
052b03e83f
limit torch version ( #3213 )
...
Co-authored-by: csric <richcsr256@gmail.com>
2 years ago
Frank Lee
93fdd35b5e
[build] fixed the doc build process ( #2618 )
2 years ago
Jiarui Fang
bc0e271e71
[buider] use builder() for cpu adam and fused optim in setup.py ( #2187 )
2 years ago
Frank Lee
81e0da7fa8
[setup] supported conda-installed torch ( #2048 )
...
* [setup] supported conda-installed torch
* polish code
2 years ago
Jiarui Fang
504419d261
[FAW] add cache manager for the cached embedding ( #1419 )
2 years ago
Frank Lee
cf6d1c9284
[CLI] refactored the launch CLI and fixed bugs in multi-node launching ( #844 )
...
* [cli] fixed multi-node job launching
* [cli] fixed a bug in version comparison
* [cli] support launching with env var
* [cli] fixed multi-node job launching
* [cli] fixed a bug in version comparison
* [cli] support launching with env var
* added docstring
* [cli] added extra launch arguments
* [cli] added default launch rdzv args
* [cli] fixed version comparison
* [cli] added docstring examples and requierment
* polish docstring
* polish code
* polish code
3 years ago
Frank Lee
01e9f834f5
[dependency] removed torchvision ( #833 )
...
* [dependency] removed torchvision
* fixed transforms
3 years ago
Frank Lee
05d9ae5999
[cli] add missing requirement ( #805 )
3 years ago
Jiarui Fang
54229cd33e
[log] better logging display with rich ( #426 )
...
* better logger using rich
* remove deepspeed in zero requirements
3 years ago
BoxiangW
a2f1565672
Update GitHub action and pre-commit settings ( #196 )
...
* Update GitHub action and pre-commit settings
* Update GitHub action and pre-commit settings (#198 )
3 years ago
Frank Lee
3defa32aee
Support TP-compatible Torch AMP and Update trainer API ( #27 )
...
* Add gradient accumulation, fix lr scheduler
* fix FP16 optimizer and adapted torch amp with tensor parallel (#18 )
* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes
* fixed trainer
* Revert "fixed trainer"
This reverts commit 2e0b0b7699
.
* improved consistency between trainer, engine and schedule (#23 )
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
3 years ago
zbian
404ecbdcc6
Migrated project
3 years ago