dataset
|
[Chat] fix readme (#5989)
|
2024-08-12 14:55:17 +08:00 |
experience_buffer
|
[ColossalChat] Update RLHF V2 (#5286)
|
2024-03-29 14:12:29 +08:00 |
experience_maker
|
[ColossalChat] Update RLHF V2 (#5286)
|
2024-03-29 14:12:29 +08:00 |
models
|
[Coati] Train DPO using PP (#6054)
|
2024-10-11 19:32:00 +08:00 |
quant
|
[ColossalChat] Update RLHF V2 (#5286)
|
2024-03-29 14:12:29 +08:00 |
ray
|
[ColossalChat] Update RLHF V2 (#5286)
|
2024-03-29 14:12:29 +08:00 |
reasoner/guided_search
|
update best answer function
|
2024-11-08 03:30:21 +00:00 |
trainer
|
[Coati] Train DPO using PP (#6054)
|
2024-10-11 19:32:00 +08:00 |
utils
|
[ColossalChat] Update RLHF V2 (#5286)
|
2024-03-29 14:12:29 +08:00 |
__init__.py
|
[ColossalChat] Update RLHF V2 (#5286)
|
2024-03-29 14:12:29 +08:00 |