add README_npu

pull/816/head
liutongtong27 2025-01-11 19:42:01 +08:00
parent 1759c4b9b4
commit ad035eb8bd
3 changed files with 593 additions and 0 deletions

298
README_npu.md Normal file
View File

@ -0,0 +1,298 @@
# InternLM-NPU
<div align="center">
<img src="./assets/logo.svg" width="200"/>
<div> </div>
<div align="center">
<b><font size="5">InternLM</font></b>
<sup>
<a href="https://internlm.intern-ai.org.cn/">
<i><font size="4">HOT</font></i>
</a>
</sup>
<div> </div>
</div>
[![license](./assets/license.svg)](./LICENSE)
[![evaluation](./assets/compass_support.svg)](https://github.com/internLM/OpenCompass/)
<!-- [![Documentation Status](https://readthedocs.org/projects/internlm/badge/?version=latest)](https://internlm.readthedocs.io/zh_CN/latest/?badge=latest) -->
[📘Commercial Application](#license) |
[🤗HuggingFace](https://huggingface.co/internlm) |
[🆕Update News](#news) |
[🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new) |
[📜Technical Report](https://arxiv.org/abs/2403.17297)<br>
[💬Chat Web](https://internlm-chat.intern-ai.org.cn/) |
[🔗API](https://internlm.intern-ai.org.cn/api/document) |
[🧩Modelers](https://modelers.cn/spaces/MindSpore-Lab/INTERNLM2-20B-PLAN)
[English](./README_npu.md) |
[简体中文](./README_npu_zh-CN.md)
</div>
## Introduction
This is a guide to using Ascend NPU to train and infer the InternLM series models.
## News
\[2025.01.15\] InternLM3-8B-Instruct can be used in Xtuner, LLaMa-Factory and transformers.
## Model Zoo
### InternLM3
| Model | Transformers(HF) | ModelScope(HF) | Release Date |
|---------------------------| ------------------------------------------ | ---------------------------------------- |--------------|
| **InternLM3-8B-Instruct** | [🤗internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [<img src="./assets/modelscope_logo.png" width="20px" /> internlm3-8b-instruct](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct) | 2025-01-15 |
## Environment Setup
### Installing Ascend CANN Toolkit and Kernels
For details about the installation method, see [Installation Scheme](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2FCANNCommunityEdition%2F80RC2alpha002%2Fquickstart%2Fquickstart%2Fquickstart_18_0004.html) or run the following commands:
```shell
# Replace the URL with the URL corresponding to the CANN version and device model.
# Install CANN Toolkit.
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run
bash Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run --install
# Install CANN Kernels.
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run
bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install
# Set environment variables.
source /usr/local/Ascend/ascend-toolkit/set_env.sh
```
## Xtuner
### Installing Xtuner
```shell
git clone https://github.com/InternLM/xtuner.git
cd xtuner
```
Modify `requirements/runtime.txt` with the following changes:
```text
bitsandbytes==0.42.0
mmengine==0.10.5
torchvision==0.19.0
numpy==1.26.4
```
Use the following command for installation:
```shell
pip install -e '.[all]'
```
**Note**:
- The default installation version of `torch` is the latest version. Please pay attention to match it with the version of `torch_npu`.
### LoRA Fine-tuning
Use the following commands to copy and rename the file to `internlm3_8b_instruct_lora_oasst1_e10.py`:
```shell
xtuner copy-cfg internlm2_5_chat_7b_qlora_oasst1_e3 .
mv internlm2_5_chat_7b_qlora_oasst1_e3_copy.py internlm3_8b_instruct_lora_oasst1_e10.py
```
The modifications to the configuration file `internlm3_8b_instruct_lora_oasst1_e10.py` are as follows:
```python
pretrained_model_name_or_path = 'internlm/internlm3-8b-instruct'
max_epochs = 10
model = dict(
type=SupervisedFinetune,
use_varlen_attn=use_varlen_attn,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16),
# quantization_config=dict(
# type=BitsAndBytesConfig,
# load_in_4bit=True,
# load_in_8bit=False,
# llm_int8_threshold=6.0,
# llm_int8_has_fp16_weight=False,
# bnb_4bit_compute_dtype=torch.float16,
# bnb_4bit_use_double_quant=True,
# bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))
custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
# dict(
# type=EvaluateChatHook,
# tokenizer=tokenizer,
# every_n_iters=evaluation_freq,
# evaluation_inputs=evaluation_inputs,
# system=SYSTEM,
# prompt_template=prompt_template)
]
randomness = dict(seed=123, deterministic=True)
```
Run the following commands to start single-machine eight-card fine-tuning:
```shell
NPROC_PER_NODE=8 xtuner train internlm3_8b_instruct_lora_oasst1_e10.py --deepspeed deepspeed_zero2
```
The fine-tuning results are saved in the directory `./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth`.
### Model Convert
Convert the model weight file obtained from fine-tuning into the Hugging Face format, which facilitates subsequent deployment and usage.
Use the following command for the conversion:
```shell
xtuner convert pth_to_hf internlm3_8b_instruct_lora_oasst1_e10.py ./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth ./work_dirs/convert_output
```
### Model Merge
LoRA or QLoRA fine-tuning generates an additional `Adapter` layer, which needs to be merged with the original model to
create a complete model. Use the following command for model merging, where `$model_path` is the local path where the
original model is stored, and `--max-shard-size` 2GB limits the maximum size of each weight file to 2GB:
```shell
xtuner convert merge $model_path ./work_dirs/convert_output ./work_dirs/merge_output --max-shard-size 2GB
```
### Chat
Chat with the merged model weights:
```shell
cp path_to_your_model/modeling_internlm3.py ./work_dirs/merge_output
xtuner chat ./work_dirs/merge_output --prompt-template internlm2_chat
```
## LLama-Factory
### Installing LLaMa-Factory
```shell
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch-npu,metrics]"
```
### Inference
Create the `examples/inference/internlm2_5_7b_chat.yaml` inference configuration file in the LLaMa-Factory directory:
```yaml
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
template: intern2
```
Run the following command to interact with the model:
```shell
llamafactory-cli chat examples/inference/internlm2_5_7b_chat.yaml
```
### Fine-tuning
Create the `examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml` configuration file in the LLaMa-Factory directory. The fine-tuning configuration file is as follows:
```yaml
### model
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
### dataset
dataset: identity
template: intern2
cutoff_len: 128
preprocessing_num_workers: 16
### output
output_dir: saves/internlm2_5_7b_chat/lora/sft
logging_steps: 5
save_steps: 20
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
```
Run the following commands to start fine-tuning:
```shell
export ASCEND_RT_VISIBLE_DEVICES=0
llamafactory-cli train examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml
```
### Accuracy
The loss curve obtained after finetuning is as follows:
![training_loss](assets/training_loss.png)
### Performance
| Chip Type | train_samples_per_second |
|-------------------|--------------------------|
| Atlas 900 A2 PODc | 49.662 |
## Transformers
### Inference
Create the inference script `inference_internlm2_5_7b_chat.py`:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# 若模型已下载,可替换成模型本地路径
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True)
# `torch_dtype=torch.float16`可以令模型以float16精度加载否则transformers会将模型加载为float32导致显存不足
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat", torch_dtype=torch.float16, trust_remote_code=True).npu()
model = model.eval()
response, history = model.chat(tokenizer, "你好,请提供三个管理时间的建议。", history=[])
print(response)
```
Execute the inference script:
```shell
python inference_internlm2_5_7b_chat.py
```
## License
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact <internlm@pjlab.org.cn>.

295
README_npu_zh-CN.md Normal file
View File

@ -0,0 +1,295 @@
# InternLM-NPU
<div align="center">
<img src="./assets//logo.svg" width="200"/>
<div>&nbsp;</div>
<div align="center">
<b><font size="5">书生·浦语 官网</font></b>
<sup>
<a href="https://internlm.intern-ai.org.cn/">
<i><font size="4">HOT</font></i>
</a>
</sup>
<div>&nbsp;</div>
</div>
[![license](./assets//license.svg)](https://github.com/open-mmlab/mmdetection/blob/main/LICENSE)
[![evaluation](./assets//compass_support.svg)](https://github.com/internLM/OpenCompass/)
<!-- [![Documentation Status](https://readthedocs.org/projects/internlm/badge/?version=latest)](https://internlm.readthedocs.io/zh_CN/latest/?badge=latest) -->
[📘商业授权](#开源许可证) |
[🤗HuggingFace](https://huggingface.co/internlm) |
[🆕最新消息](#更新) |
[🤔提交反馈](https://github.com/InternLM/InternLM/issues/new)|
[📜技术报告](https://arxiv.org/abs/2403.17297)<br>
[💬聊天应用](https://internlm-chat.intern-ai.org.cn/) |
[🔗API](https://internlm.intern-ai.org.cn/api/document) |
[🧩魔乐社区](https://modelers.cn/spaces/MindSpore-Lab/INTERNLM2-20B-PLAN)
[English](./README_npu.md) |
[简体中文](./README_npu_zh-CN.md)
</div>
## 介绍
这是一份使用 Ascend NPU 对 InternLM 系列模型进行训练和推理的指南。
## News
\[2025.01.15\] InternLM3-8B-Instruct 可用于 Xtuner、LLaMa-Factory 和 transformers 中。
## Model Zoo
### InternLM3
| Model | Transformers(HF) | ModelScope(HF) | Release Date |
|---------------------------| ------------------------------------------ | ---------------------------------------- |--------------|
| **InternLM3-8B-Instruct** | [🤗internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [<img src="./assets/modelscope_logo.png" width="20px" /> internlm3-8b-instruct](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct) | 2025-01-15 |
## 环境准备l
### 安装Ascend CANN Toolkit和Kernels
安装方法请参考[安装教程](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2FCANNCommunityEdition%2F80RC2alpha002%2Fquickstart%2Fquickstart%2Fquickstart_18_0004.html)或使用以下命令
```shell
# 请替换URL为CANN版本和设备型号对应的URL
# 安装CANN Toolkit
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run
bash Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run --install
# 安装CANN Kernels
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run
bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install
# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
```
## Xtuner
### 安装 Xtuner
```shell
git clone https://github.com/InternLM/xtuner.git
cd xtuner
```
修改`requirements/runtime.txt`,修改点如下:
```text
bitsandbytes==0.42.0
mmengine==0.10.5
torchvision==0.19.0
numpy==1.26.4
```
使用以下命令进行安装:
```shell
pip install -e '.[all]'
```
**注意**:
- 默认安装`torch`为最新版,请注意与`torch_npu`版本相匹配
### LoRA 微调
使用以下命令复制并重命名文件为`internlm3_8b_instruct_lora_oasst1_e10.py`
```shell
xtuner copy-cfg internlm2_5_chat_7b_qlora_oasst1_e3 .
mv internlm2_5_chat_7b_qlora_oasst1_e3_copy.py internlm3_8b_instruct_lora_oasst1_e10.py
```
`internlm3_8b_instruct_lora_oasst1_e10.py`配置文件的修改点如下:
```python
pretrained_model_name_or_path = 'internlm/internlm3-8b-instruct'
max_epochs = 10
model = dict(
type=SupervisedFinetune,
use_varlen_attn=use_varlen_attn,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16),
# quantization_config=dict(
# type=BitsAndBytesConfig,
# load_in_4bit=True,
# load_in_8bit=False,
# llm_int8_threshold=6.0,
# llm_int8_has_fp16_weight=False,
# bnb_4bit_compute_dtype=torch.float16,
# bnb_4bit_use_double_quant=True,
# bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))
custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
# dict(
# type=EvaluateChatHook,
# tokenizer=tokenizer,
# every_n_iters=evaluation_freq,
# evaluation_inputs=evaluation_inputs,
# system=SYSTEM,
# prompt_template=prompt_template)
]
randomness = dict(seed=123, deterministic=True)
```
通过下列命令启动单机8卡微调
```shell
NPROC_PER_NODE=8 xtuner train internlm3_8b_instruct_lora_oasst1_e10.py --deepspeed deepspeed_zero2
```
微调后结果保存在`./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth`下。
### 模型转换
将训练得到的模型权重文件转换为 Hugging Face 格式的模型文件,便于后续的部署和使用。使用以下命令进行转换:
```shell
xtuner convert pth_to_hf internlm3_8b_instruct_lora_oasst1_e10.py ./work_dirs/internlm3_8b_instruct_lora_oasst1_e10/iter_xxx.pth ./work_dirs/convert_output
```
### 模型合并
LoRA或QLoRA微调生成的是一个额外的 `Adapter` 层,需要与原模型合并才能生成一个完整的模型。使用以下命令进行模型合并,其中`$model_path`
为原模型存储的本地路径, `--max-shard-size 2GB` 限制每个权重文件最大为2GB
```shell
xtuner convert merge $model_path ./work_dirs/convert_output ./work_dirs/merge_output --max-shard-size 2GB
```
### 对话
使用合并后的模型权重进行对话:
```shell
cp path_to_your_model/modeling_internlm3.py ./work_dirs/merge_output
xtuner chat ./work_dirs/merge_output --prompt-template internlm2_chat
```
## LLama-Factory
### 安装 LLaMa-Factory
```shell
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch-npu,metrics]"
```
### 推理
在 LLaMa-Factory 路径下新建`examples/inference/internlm2_5_7b_chat.yaml`推理配置文件,文件内容为:
```yaml
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
template: intern2
```
使用以下命令与模型进行交互:
```shell
llamafactory-cli chat examples/inference/internlm2_5_7b_chat.yaml
```
### 微调
在 LLaMa-Factory 路径下新建`examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml`微调配置文件,微调配置文件如下:
```yaml
### model
model_name_or_path: xxx # Support only local loading. Set this parameter to the local weight path of InternLM2.5-7B-Chat.
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
### dataset
dataset: identity
template: intern2
cutoff_len: 128
preprocessing_num_workers: 16
### output
output_dir: saves/internlm2_5_7b_chat/lora/sft
logging_steps: 5
save_steps: 20
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
```
通过下面的命令启动微调:
```shell
export ASCEND_RT_VISIBLE_DEVICES=0
llamafactory-cli train examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml
```
### 精度
微调后得到的loss曲线如下
![training_loss](assets/training_loss.png)
### 性能
| 芯片型号 | train_samples_per_second |
|-------------------|--------------------------|
| Atlas 900 A2 PODc | 49.662 |
## Transformers
### 推理
新建推理脚本`inference_internlm2_5_7b_chat.py`,推理脚本内容为:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# 若模型已下载,可替换成模型本地路径
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True)
# `torch_dtype=torch.float16`可以令模型以float16精度加载否则transformers会将模型加载为float32导致显存不足
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat", torch_dtype=torch.float16, trust_remote_code=True).npu()
model = model.eval()
response, history = model.chat(tokenizer, "你好,请提供三个管理时间的建议。", history=[])
print(response)
```
执行推理脚本:
```shell
python inference_internlm2_5_7b_chat.py
```
## 开源许可证
本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权([申请表](https://wj.qq.com/s2/12725412/f7c1/))。其他问题与合作请联系 <internlm@pjlab.org.cn>

BIN
assets/training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB