ColossalAI/tests/kit/model_zoo/transformers/llama.py

import torch
import transformers

from ..registry import ModelAttribute, model_zoo

try:
    from transformers import LlamaConfig

    HAS_LLAMA = True
except ImportError:
    HAS_LLAMA = False

if HAS_LLAMA:
    # ===============================
    # Register LLaMA
    # ===============================

    def data_gen():
        # the input ids are corresponding to the sentence
        # 'Hello, my dog is cute'
        #
        # the code is give below:
        # -----------------------------------
        # from transformers import LlamaTokenizerFast
        # tokenizer = LlamaTokenizerFast.from_pretrained("hf-internal-testing/llama-tokenizer")
        # input = 'Hello, my dog is cute'
        # tokenized_input = tokenizer(input, return_tensors='pt').to('cuda')
        # -----------------------------------

        input_ids = torch.Tensor(
            [[1, 15043, 29892, 590, 11203, 338, 274, 1082], [1, 15043, 29892, 590, 11203, 338, 274, 1082]]
        ).long()
        attention_mask = torch.Tensor([[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1]]).long()
        return dict(input_ids=input_ids, attention_mask=attention_mask)

    # label is needed for casual lm
    def data_gen_for_casual_lm():
        data = data_gen()
        labels = data["input_ids"].clone()
        data["labels"] = labels
        return data

    # transform the output to a dict
    output_transform_fn = lambda x: x

    # function to get the loss
    loss_fn = lambda output: output["last_hidden_state"].mean()
    loss_fn_for_casual_lm = lambda output: output["loss"]
    loss_fn_for_seq_classification = lambda output: output["logits"].mean()

    config = LlamaConfig(
        num_hidden_layers=4,
        hidden_size=128,
        intermediate_size=256,
        num_attention_heads=4,
        max_position_embeddings=128,
        num_labels=16,
    )

    if hasattr(config, "pad_token_id"):
        config.pad_token_id = config.eos_token_id

    # register the following models
    # transformers.LlamaModel,
    # transformers.LlamaForCausalLM,
    # transformers.LlamaForSequenceClassification,
    model_zoo.register(
        name="transformers_llama",
        model_fn=lambda: transformers.LlamaModel(config),
        data_gen_fn=data_gen,
        output_transform_fn=output_transform_fn,
        loss_fn=loss_fn,
        model_attribute=ModelAttribute(has_control_flow=True),
    )
    model_zoo.register(
        name="transformers_llama_for_casual_lm",
        model_fn=lambda: transformers.LlamaForCausalLM(config),
        data_gen_fn=data_gen_for_casual_lm,
        output_transform_fn=output_transform_fn,
        loss_fn=loss_fn_for_casual_lm,
        model_attribute=ModelAttribute(has_control_flow=True),
    )
    model_zoo.register(
        name="transformers_llama_for_sequence_classification",
        model_fn=lambda: transformers.LlamaForSequenceClassification(config),
        data_gen_fn=data_gen,
        output_transform_fn=output_transform_fn,
        loss_fn=loss_fn_for_seq_classification,
        model_attribute=ModelAttribute(has_control_flow=True),
    )
[shardformer] adapted T5 and LLaMa test to use kit (#4049) * [shardformer] adapted T5 and LLaMa test to use kit * polish code 1 year ago			`import torch`
			`import transformers`

			`from ..registry import ModelAttribute, model_zoo`

			`try:`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`from transformers import LlamaConfig`

[shardformer] adapted T5 and LLaMa test to use kit (#4049) * [shardformer] adapted T5 and LLaMa test to use kit * polish code 1 year ago			`HAS_LLAMA = True`
			`except ImportError:`
			`HAS_LLAMA = False`

			`if HAS_LLAMA:`
			`# ===============================`
			`# Register LLaMA`
			`# ===============================`

			`def data_gen():`
			`# the input ids are corresponding to the sentence`
			`# 'Hello, my dog is cute'`
			`#`
			`# the code is give below:`
			`# -----------------------------------`
			`# from transformers import LlamaTokenizerFast`
			`# tokenizer = LlamaTokenizerFast.from_pretrained("hf-internal-testing/llama-tokenizer")`
			`# input = 'Hello, my dog is cute'`
			`# tokenized_input = tokenizer(input, return_tensors='pt').to('cuda')`
			`# -----------------------------------`

[Inference] Dynamic Batching Inference, online and offline (#4953) * [inference] Dynamic Batching for Single and Multiple GPUs (#4831) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [inference] Async dynamic batching (#4894) * finish input and output logic * add generate * test forward * 1 * [inference]Re push async dynamic batching (#4901) * adapt to ray server * finish async * finish test * del test --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> * Revert "[inference]Re push async dynamic batching (#4901)" (#4905) This reverts commit fbf3c09e673794ed18c91d4bab1a7dfea052e95a. * Revert "[inference] Async dynamic batching (#4894)" This reverts commit fced14025043e29ce816b315f440601188f7f79f. * Revert "[inference] Async dynamic batching (#4894)" (#4909) This reverts commit fced14025043e29ce816b315f440601188f7f79f. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * [infer]Add Ray Distributed Environment Init Scripts (#4911) * Revert "[inference] Async dynamic batching (#4894)" This reverts commit fced14025043e29ce816b315f440601188f7f79f. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * support dynamic batch for bloom model and is_running function * [Inference]Test for new Async engine (#4935) * infer engine * infer engine * test engine * test engine * new manager * change step * add * test * fix * fix * finish test * finish test * finish test * finish test * add license --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> * add assertion for config (#4947) * [Inference] Finish dynamic batching offline test (#4948) * test * fix test * fix quant * add default * fix * fix some bugs * fix some bugs * fix * fix bug * fix bugs * reset param --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: Cuiqing Li <lixx3527@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497outlook.com> 1 year ago			`input_ids = torch.Tensor(`
			`[[1, 15043, 29892, 590, 11203, 338, 274, 1082], [1, 15043, 29892, 590, 11203, 338, 274, 1082]]`
			`).long()`
			`attention_mask = torch.Tensor([[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1]]).long()`
[shardformer] adapted T5 and LLaMa test to use kit (#4049) * [shardformer] adapted T5 and LLaMa test to use kit * polish code 1 year ago			`return dict(input_ids=input_ids, attention_mask=attention_mask)`

			`# label is needed for casual lm`
			`def data_gen_for_casual_lm():`
			`data = data_gen()`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`labels = data["input_ids"].clone()`
			`data["labels"] = labels`
[shardformer] adapted T5 and LLaMa test to use kit (#4049) * [shardformer] adapted T5 and LLaMa test to use kit * polish code 1 year ago			`return data`

			`# transform the output to a dict`
			`output_transform_fn = lambda x: x`

			`# function to get the loss`
[test] merge old components to test to model zoo (#4945) * [test] add custom models in model zoo * [test] update legacy test * [test] update model zoo * [test] update gemini test * [test] remove components to test 1 year ago			`loss_fn = lambda output: output["last_hidden_state"].mean()`
			`loss_fn_for_casual_lm = lambda output: output["loss"]`
			`loss_fn_for_seq_classification = lambda output: output["logits"].mean()`
[shardformer] adapted T5 and LLaMa test to use kit (#4049) * [shardformer] adapted T5 and LLaMa test to use kit * polish code 1 year ago
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`config = LlamaConfig(`
			`num_hidden_layers=4,`
			`hidden_size=128,`
			`intermediate_size=256,`
			`num_attention_heads=4,`
			`max_position_embeddings=128,`
			`num_labels=16,`
			`)`
[shardformer] adapted T5 and LLaMa test to use kit (#4049) * [shardformer] adapted T5 and LLaMa test to use kit * polish code 1 year ago
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix 1 year ago			`if hasattr(config, "pad_token_id"):`
			`config.pad_token_id = config.eos_token_id`

[shardformer] adapted T5 and LLaMa test to use kit (#4049) * [shardformer] adapted T5 and LLaMa test to use kit * polish code 1 year ago			`# register the following models`
			`# transformers.LlamaModel,`
			`# transformers.LlamaForCausalLM,`
			`# transformers.LlamaForSequenceClassification,`
[misc] update pre-commit and run all files (#4752) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format 1 year ago			`model_zoo.register(`
			`name="transformers_llama",`
			`model_fn=lambda: transformers.LlamaModel(config),`
			`data_gen_fn=data_gen,`
			`output_transform_fn=output_transform_fn,`
			`loss_fn=loss_fn,`
			`model_attribute=ModelAttribute(has_control_flow=True),`
			`)`
			`model_zoo.register(`
			`name="transformers_llama_for_casual_lm",`
			`model_fn=lambda: transformers.LlamaForCausalLM(config),`
			`data_gen_fn=data_gen_for_casual_lm,`
			`output_transform_fn=output_transform_fn,`
			`loss_fn=loss_fn_for_casual_lm,`
			`model_attribute=ModelAttribute(has_control_flow=True),`
			`)`
			`model_zoo.register(`
			`name="transformers_llama_for_sequence_classification",`
			`model_fn=lambda: transformers.LlamaForSequenceClassification(config),`
			`data_gen_fn=data_gen,`
			`output_transform_fn=output_transform_fn,`
			`loss_fn=loss_fn_for_seq_classification,`
			`model_attribute=ModelAttribute(has_control_flow=True),`
			`)`