mirror of https://github.com/InternLM/InternLM
add README_npu
parent
1759c4b9b4
commit
ad035eb8bd
|
@ -0,0 +1,298 @@
|
|||
# InternLM-NPU
|
||||
|
||||
<div align="center">
|
||||
|
||||
<img src="./assets/logo.svg" width="200"/>
|
||||
<div> </div>
|
||||
<div align="center">
|
||||
<b><font size="5">InternLM</font></b>
|
||||
<sup>
|
||||
<a href="https://internlm.intern-ai.org.cn/">
|
||||
<i><font size="4">HOT</font></i>
|
||||
</a>
|
||||
</sup>
|
||||
<div> </div>
|
||||
</div>
|
||||
|
||||
[](./LICENSE)
|
||||
[](https://github.com/internLM/OpenCompass/)
|
||||
|
||||
<!-- [](https://internlm.readthedocs.io/zh_CN/latest/?badge=latest) -->
|
||||
|
||||
[📘Commercial Application](#license) |
|
||||
[🤗HuggingFace](https://huggingface.co/internlm) |
|
||||
[🆕Update News](#news) |
|
||||
[🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new) |
|
||||
[📜Technical Report](https://arxiv.org/abs/2403.17297)<br>
|
||||
[💬Chat Web](https://internlm-chat.intern-ai.org.cn/) |
|
||||
[🔗API](https://internlm.intern-ai.org.cn/api/document) |
|
||||
[🧩Modelers](https://modelers.cn/spaces/MindSpore-Lab/INTERNLM2-20B-PLAN)
|
||||
|
||||
[English](./README_npu.md) |
|
||||
[简体中文](./README_npu_zh-CN.md)
|
||||
|
||||
</div>
|
||||
|
||||
## Introduction
|
||||
This is a guide to using Ascend NPU to train and infer the InternLM series models.
|
||||
|
||||
## News
|
||||
\[2025.01.15\] InternLM3-8B-Instruct can be used in Xtuner, LLaMa-Factory and transformers.
|
||||
|
||||
## Model Zoo
|
||||
|
||||
### InternLM3
|
||||
| Model | Transformers(HF) | ModelScope(HF) | Release Date |
|
||||
|---------------------------| ------------------------------------------ | ---------------------------------------- |--------------|
|
||||
| **InternLM3-8B-Instruct** | [🤗internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [<img src="./assets/modelscope_logo.png" width="20px" /> internlm3-8b-instruct](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct) | 2025-01-15 |
|
||||
|
||||
## Environment Setup
|
||||
|
||||
### Installing Ascend CANN Toolkit and Kernels
|
||||
|
||||
For details about the installation method, see [Installation Scheme](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2FCANNCommunityEdition%2F80RC2alpha002%2Fquickstart%2Fquickstart%2Fquickstart_18_0004.html) or run the following commands:
|
||||
|
||||
```shell
|
||||
# Replace the URL with the URL corresponding to the CANN version and device model.
|
||||
# Install CANN Toolkit.
|
||||
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run
|
||||
bash Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run --install
|
||||
|
||||
# Install CANN Kernels.
|
||||
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run
|
||||
bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install
|
||||
|
||||
# Set environment variables.
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
```
|
||||
|
||||
## Xtuner
|
||||
|
||||
### Installing Xtuner
|
||||
|
||||
```shell
|
||||
git clone https://github.com/InternLM/xtuner.git
|
||||
cd xtuner
|
||||
```
|
||||
|
||||
Modify `requirements/runtime.txt` with the following changes:
|
||||
|
||||
```text
|
||||
bitsandbytes==0.42.0
|
||||
mmengine==0.10.5
|
||||
torchvision==0.19.0
|
||||
numpy==1.26.4
|
||||
```
|
||||
|
||||
Use the following command for installation:
|
||||
|
||||
```shell
|
||||
pip install -e '.[all]'
|
||||
```
|
||||
|
||||
**Note**:
|
||||
|
||||
- The default installation version of `torch` is the latest version. Please pay attention to match it with the version of `torch_npu`.
|
||||
|
||||
### LoRA Fine-tuning
|
||||
|
||||
Use the following commands to copy and rename the file to `internlm3_8b_instruct_lora_oasst1_e10.py`:
|
||||
|
||||
```shell
|
||||
xtuner copy-cfg internlm2_5_chat_7b_qlora_oasst1_e3 .
|
||||
mv internlm2_5_chat_7b_qlora_oasst1_e3_copy.py internlm3_8b_instruct_lora_oasst1_e10.py
|
||||
```
|
||||
|
||||
The modifications to the configuration file `internlm3_8b_instruct_lora_oasst1_e10.py` are as follows:
|
||||
|
||||
```python
|
||||
pretrained_model_name_or_path = 'internlm/internlm3-8b-instruct'
|
||||
|
||||
max_epochs = 10
|
||||
|
||||
model = dict(
|
||||
type=SupervisedFinetune,
|
||||
use_varlen_attn=use_varlen_attn,
|
||||
llm=dict(
|
||||
type=AutoModelForCausalLM.from_pretrained,
|
||||
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||
trust_remote_code=True,
|
||||
torch_dtype=torch.float16),
|
||||
# quantization_config=dict(
|
||||
# type=BitsAndBytesConfig,
|
||||
# load_in_4bit=True,
|
||||
# load_in_8bit=False,
|
||||
# llm_int8_threshold=6.0,
|
||||
# llm_int8_has_fp16_weight=False,
|
||||
# bnb_4bit_compute_dtype=torch.float16,
|
||||
# bnb_4bit_use_double_quant=True,
|
||||
# bnb_4bit_quant_type='nf4')),
|
||||
lora=dict(
|
||||
type=LoraConfig,
|
||||
r=64,
|
||||
lora_alpha=16,
|
||||
lora_dropout=0.1,
|
||||
bias='none',
|
||||
task_type='CAUSAL_LM'))
|
||||
|
||||
custom_hooks = [
|
||||
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||
# dict(
|
||||
# type=EvaluateChatHook,
|
||||
# tokenizer=tokenizer,
|
||||
# every_n_iters=evaluation_freq,
|
||||
# evaluation_inputs=evaluation_inputs,
|
||||
# system=SYSTEM,
|
||||
# prompt_template=prompt_template)
|
||||
]
|
||||
|
||||
randomness = dict(seed=123, deterministic=True)
|
||||
```
|
||||
|
||||
Run the following commands to start single-machine eight-card fine-tuning:
|
||||
|
||||
```shell
|
||||
NPROC_PER_NODE=8 xtuner train internlm3_8b_instruct_lora_oasst1_e10.py --deepspeed deepspeed_zero2
|
||||
```
|
||||
|
||||
The fine-tuning results are saved in the directory `./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth`.
|
||||
|
||||
### Model Convert
|
||||
|
||||
Convert the model weight file obtained from fine-tuning into the Hugging Face format, which facilitates subsequent deployment and usage.
|
||||
Use the following command for the conversion:
|
||||
|
||||
```shell
|
||||
xtuner convert pth_to_hf internlm3_8b_instruct_lora_oasst1_e10.py ./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth ./work_dirs/convert_output
|
||||
```
|
||||
|
||||
### Model Merge
|
||||
|
||||
LoRA or QLoRA fine-tuning generates an additional `Adapter` layer, which needs to be merged with the original model to
|
||||
create a complete model. Use the following command for model merging, where `$model_path` is the local path where the
|
||||
original model is stored, and `--max-shard-size` 2GB limits the maximum size of each weight file to 2GB:
|
||||
|
||||
```shell
|
||||
xtuner convert merge $model_path ./work_dirs/convert_output ./work_dirs/merge_output --max-shard-size 2GB
|
||||
```
|
||||
|
||||
### Chat
|
||||
|
||||
Chat with the merged model weights:
|
||||
|
||||
|
||||
```shell
|
||||
cp path_to_your_model/modeling_internlm3.py ./work_dirs/merge_output
|
||||
xtuner chat ./work_dirs/merge_output --prompt-template internlm2_chat
|
||||
```
|
||||
|
||||
## LLama-Factory
|
||||
|
||||
### Installing LLaMa-Factory
|
||||
|
||||
```shell
|
||||
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
|
||||
cd LLaMA-Factory
|
||||
pip install -e ".[torch-npu,metrics]"
|
||||
```
|
||||
|
||||
### Inference
|
||||
|
||||
Create the `examples/inference/internlm2_5_7b_chat.yaml` inference configuration file in the LLaMa-Factory directory:
|
||||
|
||||
```yaml
|
||||
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
|
||||
template: intern2
|
||||
```
|
||||
|
||||
Run the following command to interact with the model:
|
||||
|
||||
```shell
|
||||
llamafactory-cli chat examples/inference/internlm2_5_7b_chat.yaml
|
||||
```
|
||||
|
||||
### Fine-tuning
|
||||
|
||||
Create the `examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml` configuration file in the LLaMa-Factory directory. The fine-tuning configuration file is as follows:
|
||||
|
||||
```yaml
|
||||
### model
|
||||
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
|
||||
|
||||
### method
|
||||
stage: sft
|
||||
do_train: true
|
||||
finetuning_type: lora
|
||||
lora_target: all
|
||||
|
||||
### dataset
|
||||
dataset: identity
|
||||
template: intern2
|
||||
cutoff_len: 128
|
||||
preprocessing_num_workers: 16
|
||||
|
||||
### output
|
||||
output_dir: saves/internlm2_5_7b_chat/lora/sft
|
||||
logging_steps: 5
|
||||
save_steps: 20
|
||||
plot_loss: true
|
||||
overwrite_output_dir: true
|
||||
|
||||
### train
|
||||
per_device_train_batch_size: 8
|
||||
gradient_accumulation_steps: 1
|
||||
learning_rate: 1.0e-4
|
||||
num_train_epochs: 5.0
|
||||
lr_scheduler_type: cosine
|
||||
warmup_ratio: 0.1
|
||||
bf16: true
|
||||
ddp_timeout: 180000000
|
||||
```
|
||||
|
||||
Run the following commands to start fine-tuning:
|
||||
|
||||
```shell
|
||||
export ASCEND_RT_VISIBLE_DEVICES=0
|
||||
llamafactory-cli train examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml
|
||||
```
|
||||
|
||||
### Accuracy
|
||||
|
||||
The loss curve obtained after finetuning is as follows:
|
||||
|
||||

|
||||
|
||||
### Performance
|
||||
|
||||
| Chip Type | train_samples_per_second |
|
||||
|-------------------|--------------------------|
|
||||
| Atlas 900 A2 PODc | 49.662 |
|
||||
|
||||
## Transformers
|
||||
|
||||
### Inference
|
||||
|
||||
Create the inference script `inference_internlm2_5_7b_chat.py`:
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
# 若模型已下载,可替换成模型本地路径
|
||||
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True)
|
||||
# `torch_dtype=torch.float16`可以令模型以float16精度加载,否则transformers会将模型加载为float32,导致显存不足
|
||||
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat", torch_dtype=torch.float16, trust_remote_code=True).npu()
|
||||
model = model.eval()
|
||||
response, history = model.chat(tokenizer, "你好,请提供三个管理时间的建议。", history=[])
|
||||
print(response)
|
||||
```
|
||||
|
||||
Execute the inference script:
|
||||
|
||||
```shell
|
||||
python inference_internlm2_5_7b_chat.py
|
||||
```
|
||||
|
||||
|
||||
## License
|
||||
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <internlm@pjlab.org.cn>.
|
|
@ -0,0 +1,295 @@
|
|||
# InternLM-NPU
|
||||
|
||||
<div align="center">
|
||||
|
||||
<img src="./assets//logo.svg" width="200"/>
|
||||
<div> </div>
|
||||
<div align="center">
|
||||
<b><font size="5">书生·浦语 官网</font></b>
|
||||
<sup>
|
||||
<a href="https://internlm.intern-ai.org.cn/">
|
||||
<i><font size="4">HOT</font></i>
|
||||
</a>
|
||||
</sup>
|
||||
<div> </div>
|
||||
</div>
|
||||
|
||||
[](https://github.com/open-mmlab/mmdetection/blob/main/LICENSE)
|
||||
[](https://github.com/internLM/OpenCompass/)
|
||||
|
||||
<!-- [](https://internlm.readthedocs.io/zh_CN/latest/?badge=latest) -->
|
||||
|
||||
[📘商业授权](#开源许可证) |
|
||||
[🤗HuggingFace](https://huggingface.co/internlm) |
|
||||
[🆕最新消息](#更新) |
|
||||
[🤔提交反馈](https://github.com/InternLM/InternLM/issues/new)|
|
||||
[📜技术报告](https://arxiv.org/abs/2403.17297)<br>
|
||||
[💬聊天应用](https://internlm-chat.intern-ai.org.cn/) |
|
||||
[🔗API](https://internlm.intern-ai.org.cn/api/document) |
|
||||
[🧩魔乐社区](https://modelers.cn/spaces/MindSpore-Lab/INTERNLM2-20B-PLAN)
|
||||
|
||||
[English](./README_npu.md) |
|
||||
[简体中文](./README_npu_zh-CN.md)
|
||||
|
||||
</div>
|
||||
|
||||
## 介绍
|
||||
这是一份使用 Ascend NPU 对 InternLM 系列模型进行训练和推理的指南。
|
||||
|
||||
## News
|
||||
\[2025.01.15\] InternLM3-8B-Instruct 可用于 Xtuner、LLaMa-Factory 和 transformers 中。
|
||||
|
||||
## Model Zoo
|
||||
|
||||
### InternLM3
|
||||
| Model | Transformers(HF) | ModelScope(HF) | Release Date |
|
||||
|---------------------------| ------------------------------------------ | ---------------------------------------- |--------------|
|
||||
| **InternLM3-8B-Instruct** | [🤗internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [<img src="./assets/modelscope_logo.png" width="20px" /> internlm3-8b-instruct](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct) | 2025-01-15 |
|
||||
|
||||
## 环境准备l
|
||||
|
||||
### 安装Ascend CANN Toolkit和Kernels
|
||||
|
||||
安装方法请参考[安装教程](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2FCANNCommunityEdition%2F80RC2alpha002%2Fquickstart%2Fquickstart%2Fquickstart_18_0004.html)或使用以下命令
|
||||
|
||||
```shell
|
||||
# 请替换URL为CANN版本和设备型号对应的URL
|
||||
# 安装CANN Toolkit
|
||||
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run
|
||||
bash Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run --install
|
||||
|
||||
# 安装CANN Kernels
|
||||
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run
|
||||
bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install
|
||||
|
||||
# 设置环境变量
|
||||
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
||||
```
|
||||
|
||||
## Xtuner
|
||||
|
||||
### 安装 Xtuner
|
||||
|
||||
```shell
|
||||
git clone https://github.com/InternLM/xtuner.git
|
||||
cd xtuner
|
||||
```
|
||||
|
||||
修改`requirements/runtime.txt`,修改点如下:
|
||||
|
||||
```text
|
||||
bitsandbytes==0.42.0
|
||||
mmengine==0.10.5
|
||||
torchvision==0.19.0
|
||||
numpy==1.26.4
|
||||
```
|
||||
|
||||
使用以下命令进行安装:
|
||||
|
||||
```shell
|
||||
pip install -e '.[all]'
|
||||
```
|
||||
|
||||
**注意**:
|
||||
|
||||
- 默认安装`torch`为最新版,请注意与`torch_npu`版本相匹配
|
||||
|
||||
### LoRA 微调
|
||||
|
||||
使用以下命令复制并重命名文件为`internlm3_8b_instruct_lora_oasst1_e10.py`,
|
||||
|
||||
```shell
|
||||
xtuner copy-cfg internlm2_5_chat_7b_qlora_oasst1_e3 .
|
||||
mv internlm2_5_chat_7b_qlora_oasst1_e3_copy.py internlm3_8b_instruct_lora_oasst1_e10.py
|
||||
```
|
||||
|
||||
`internlm3_8b_instruct_lora_oasst1_e10.py`配置文件的修改点如下:
|
||||
|
||||
```python
|
||||
pretrained_model_name_or_path = 'internlm/internlm3-8b-instruct'
|
||||
|
||||
max_epochs = 10
|
||||
|
||||
model = dict(
|
||||
type=SupervisedFinetune,
|
||||
use_varlen_attn=use_varlen_attn,
|
||||
llm=dict(
|
||||
type=AutoModelForCausalLM.from_pretrained,
|
||||
pretrained_model_name_or_path=pretrained_model_name_or_path,
|
||||
trust_remote_code=True,
|
||||
torch_dtype=torch.float16),
|
||||
# quantization_config=dict(
|
||||
# type=BitsAndBytesConfig,
|
||||
# load_in_4bit=True,
|
||||
# load_in_8bit=False,
|
||||
# llm_int8_threshold=6.0,
|
||||
# llm_int8_has_fp16_weight=False,
|
||||
# bnb_4bit_compute_dtype=torch.float16,
|
||||
# bnb_4bit_use_double_quant=True,
|
||||
# bnb_4bit_quant_type='nf4')),
|
||||
lora=dict(
|
||||
type=LoraConfig,
|
||||
r=64,
|
||||
lora_alpha=16,
|
||||
lora_dropout=0.1,
|
||||
bias='none',
|
||||
task_type='CAUSAL_LM'))
|
||||
|
||||
custom_hooks = [
|
||||
dict(type=DatasetInfoHook, tokenizer=tokenizer),
|
||||
# dict(
|
||||
# type=EvaluateChatHook,
|
||||
# tokenizer=tokenizer,
|
||||
# every_n_iters=evaluation_freq,
|
||||
# evaluation_inputs=evaluation_inputs,
|
||||
# system=SYSTEM,
|
||||
# prompt_template=prompt_template)
|
||||
]
|
||||
|
||||
randomness = dict(seed=123, deterministic=True)
|
||||
```
|
||||
|
||||
通过下列命令启动单机8卡微调:
|
||||
|
||||
```shell
|
||||
NPROC_PER_NODE=8 xtuner train internlm3_8b_instruct_lora_oasst1_e10.py --deepspeed deepspeed_zero2
|
||||
```
|
||||
|
||||
微调后结果保存在`./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth`下。
|
||||
|
||||
### 模型转换
|
||||
|
||||
将训练得到的模型权重文件转换为 Hugging Face 格式的模型文件,便于后续的部署和使用。使用以下命令进行转换:
|
||||
|
||||
```shell
|
||||
xtuner convert pth_to_hf internlm3_8b_instruct_lora_oasst1_e10.py ./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth ./work_dirs/convert_output
|
||||
```
|
||||
|
||||
### 模型合并
|
||||
|
||||
LoRA或QLoRA微调生成的是一个额外的 `Adapter` 层,需要与原模型合并才能生成一个完整的模型。使用以下命令进行模型合并,其中`$model_path`
|
||||
为原模型存储的本地路径, `--max-shard-size 2GB` 限制每个权重文件最大为2GB:
|
||||
|
||||
```shell
|
||||
xtuner convert merge $model_path ./work_dirs/convert_output ./work_dirs/merge_output --max-shard-size 2GB
|
||||
```
|
||||
|
||||
### 对话
|
||||
|
||||
使用合并后的模型权重进行对话:
|
||||
|
||||
```shell
|
||||
cp path_to_your_model/modeling_internlm3.py ./work_dirs/merge_output
|
||||
xtuner chat ./work_dirs/merge_output --prompt-template internlm2_chat
|
||||
```
|
||||
|
||||
## LLama-Factory
|
||||
|
||||
### 安装 LLaMa-Factory
|
||||
|
||||
```shell
|
||||
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
|
||||
cd LLaMA-Factory
|
||||
pip install -e ".[torch-npu,metrics]"
|
||||
```
|
||||
|
||||
### 推理
|
||||
|
||||
在 LLaMa-Factory 路径下新建`examples/inference/internlm2_5_7b_chat.yaml`推理配置文件,文件内容为:
|
||||
|
||||
```yaml
|
||||
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
|
||||
template: intern2
|
||||
```
|
||||
|
||||
使用以下命令与模型进行交互:
|
||||
|
||||
```shell
|
||||
llamafactory-cli chat examples/inference/internlm2_5_7b_chat.yaml
|
||||
```
|
||||
|
||||
### 微调
|
||||
|
||||
在 LLaMa-Factory 路径下新建`examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml`微调配置文件,微调配置文件如下:
|
||||
|
||||
```yaml
|
||||
### model
|
||||
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
|
||||
|
||||
### method
|
||||
stage: sft
|
||||
do_train: true
|
||||
finetuning_type: lora
|
||||
lora_target: all
|
||||
|
||||
### dataset
|
||||
dataset: identity
|
||||
template: intern2
|
||||
cutoff_len: 128
|
||||
preprocessing_num_workers: 16
|
||||
|
||||
### output
|
||||
output_dir: saves/internlm2_5_7b_chat/lora/sft
|
||||
logging_steps: 5
|
||||
save_steps: 20
|
||||
plot_loss: true
|
||||
overwrite_output_dir: true
|
||||
|
||||
### train
|
||||
per_device_train_batch_size: 8
|
||||
gradient_accumulation_steps: 1
|
||||
learning_rate: 1.0e-4
|
||||
num_train_epochs: 5.0
|
||||
lr_scheduler_type: cosine
|
||||
warmup_ratio: 0.1
|
||||
bf16: true
|
||||
ddp_timeout: 180000000
|
||||
```
|
||||
|
||||
通过下面的命令启动微调:
|
||||
|
||||
```shell
|
||||
export ASCEND_RT_VISIBLE_DEVICES=0
|
||||
llamafactory-cli train examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml
|
||||
```
|
||||
|
||||
### 精度
|
||||
|
||||
微调后得到的loss曲线如下:
|
||||
|
||||

|
||||
|
||||
### 性能
|
||||
|
||||
| 芯片型号 | train_samples_per_second |
|
||||
|-------------------|--------------------------|
|
||||
| Atlas 900 A2 PODc | 49.662 |
|
||||
|
||||
## Transformers
|
||||
|
||||
### 推理
|
||||
|
||||
新建推理脚本`inference_internlm2_5_7b_chat.py`,推理脚本内容为:
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
# 若模型已下载,可替换成模型本地路径
|
||||
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True)
|
||||
# `torch_dtype=torch.float16`可以令模型以float16精度加载,否则transformers会将模型加载为float32,导致显存不足
|
||||
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat", torch_dtype=torch.float16, trust_remote_code=True).npu()
|
||||
model = model.eval()
|
||||
response, history = model.chat(tokenizer, "你好,请提供三个管理时间的建议。", history=[])
|
||||
print(response)
|
||||
```
|
||||
|
||||
执行推理脚本:
|
||||
|
||||
```shell
|
||||
python inference_internlm2_5_7b_chat.py
|
||||
```
|
||||
|
||||
## 开源许可证
|
||||
|
||||
本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权([申请表](https://wj.qq.com/s2/12725412/f7c1/))。其他问题与合作请联系 <internlm@pjlab.org.cn>。
|
Binary file not shown.
After Width: | Height: | Size: 33 KiB |
Loading…
Reference in New Issue