df5e9c53cf
* Add dpo. Fix sft, ppo, lora. Refactor all * fix and tested ppo * 2 nd round refactor * add ci tests * fix ci * fix ci * fix readme, style * fix readme style * fix style, fix benchmark * reproduce benchmark result, remove useless files * rename to ColossalChat * use new image * fix ci workflow * fix ci * use local model/tokenizer for ci tests * fix ci * fix ci * fix ci * fix ci timeout * fix rm progress bar. fix ci timeout * fix ci * fix ci typo * remove 3d plugin from ci temporary * test environment * cannot save optimizer * support chat template * fix readme * fix path * test ci locally * restore build_or_pr * fix ci data path * fix benchmark * fix ci, move ci tests to 3080, disable fast tokenizer * move ci to 85 * support flash attention 2 * add all-in-one data preparation script. Fix colossal-llama2-chat chat template * add hardware requirements * move ci test data * fix save_model, add unwrap * fix missing bos * fix missing bos; support grad accumulation with gemini * fix ci * fix ci * fix ci * fix llama2 chat template config * debug sft * debug sft * fix colossalai version requirement * fix ci * add sanity check to prevent NaN loss * fix requirements * add dummy data generation script * add dummy data generation script * add dummy data generation script * add dummy data generation script * update readme * update readme * update readme and ignore * fix logger bug * support parallel_output * modify data preparation logic * fix tokenization * update lr * fix inference * run pre-commit --------- Co-authored-by: Tong Li <tong.li352711588@gmail.com> |
||
---|---|---|
.. | ||
callbacks | ||
README.md | ||
__init__.py | ||
detached_replay_buffer.py | ||
detached_trainer_base.py | ||
detached_trainer_ppo.py | ||
experience_maker_holder.py | ||
lora_constructor.py | ||
utils.py |
README.md
⚠️ This content may be outdated since the major update of Colossal Chat. We will update this content soon.
Distributed PPO Training on Stage 3
Detach Experience Makers and Trainers
We can completely separate the trainers and makers.
- The experience maker performs inference, produces experience, and remotely delivers it to the trainer (1).
- The trainer consumes experience to train models, and periodically transmits new model parameters to the maker (2.1, 2.2).
- Using an experience buffer to overlap transmission and computing.
In this manner, each node will work continuously without model idle time, and different optimization strategies can be applied for inference and training to meet the needs of speed or storage. It is also helpful for scalability.
DetachedPPOTrainer
and ExperienceMakerHolder
are Ray Actors (distinguished from Actor Model), representing Trainer and Experience Maker on the graph above, respectively.
Usage
See examples at ColossalAI/application/Chat/examples/ray
Setup Makers
-
define makers' environment variables :
env_info_makers = [{ 'local_rank': '0', 'rank': str(rank), 'world_size': str(num_makers), 'master_port': maker_port, 'master_addr': master_addr } for rank in range(num_makers)]
-
define maker models :
def model_fn(): actor = get_actor_from_args(...) critic = get_critic_from_args(...) reward_model = get_reward_model_from_args(...) initial_model = get_actor_from_args(...) return actor, critic, reward_model, initial_model
-
set experience_holder_refs :
experience_holder_refs = [ ExperienceMakerHolder.options( name=f"maker_{i}", num_gpus=1, max_concurrency=2 ).remote( detached_trainer_name_list=[f"trainer_{x}" for x in target_trainers(...)], model_fn=model_fn, ...) for i, env_info_maker in enumerate(env_info_makers) ]
The names in the
detached_trainer_name_list
refer to the target trainers that the maker should send experience to. We set a trainer's name the same as a maker, by.options(name="str")
. See below.
Setup Trainers
-
define trainers' environment variables :
env_info_trainers = [{ 'local_rank': '0', 'rank': str(rank), 'world_size': str(num_trainers), 'master_port': trainer_port, 'master_addr': master_addr } for rank in range(num_trainers)]
-
define trainer models :
def trainer_model_fn(): actor = get_actor_from_args(...) critic = get_critic_from_args(...) return actor, critic
-
set trainer_refs :
trainer_refs = [ DetachedPPOTrainer.options( name=f"trainer{i}", num_gpus=1, max_concurrency=2 ).remote( experience_maker_holder_name_list=[f"maker{x}" for x in target_makers(...)], model_fn = trainer_model_fn(), ...) for i, env_info_trainer in enumerate(env_info_trainers) ]
The names in
experience_maker_holder_name_list
refer to the target makers that the trainer should send updated models to. By settingdetached_trainer_name_list
andexperience_maker_holder_name_list
, we can customize the transmission graph.
Launch Jobs
-
define data_loader :
def data_loader_fn(): return = torch.utils.data.DataLoader(dataset=dataset)
-
launch makers :
wait_tasks = [] for experience_holder_ref in experience_holder_refs: wait_tasks.append( experience_holder_ref.workingloop.remote(data_loader_fn(), num_steps=experience_steps))
-
launch trainers :
for trainer_ref in trainer_refs: wait_tasks.append(trainer_ref.fit.remote(total_steps, update_steps, train_epochs))
-
wait for done :
ray.get(wait_tasks)
Flexible Structure
We can deploy different strategies to makers and trainers. Here are some notions.
2 Makers 1 Trainer
2 Makers 2 Trainer
Maker Inference Quantization
Tensor Parallel
TODO
- Support LoRA
- Support TP & PP