diff --git a/ecosystem/README_npu.md b/ecosystem/README_npu.md index 5b75fd7..7b0e03a 100644 --- a/ecosystem/README_npu.md +++ b/ecosystem/README_npu.md @@ -2,7 +2,7 @@
- +
 
InternLM @@ -43,9 +43,9 @@ This is a guide to using Ascend NPU to train and infer the InternLM series model ### InternLM3 -| Model | Transformers | ModelScope | Modelers | Release Date | -| ------------------------- | ---------------------------------------------------- | -------------------------------------------------- | ------------------------------------------------- | ------------ | -| **InternLM3-8B-Instruct** | [🤗internlm3_8B_instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [ internlm3_8b_instruct](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct/summary) | [![Open in Modelers](https://modelers.cn/assets/logo1-1bf58310.svg)](https://modelers.cn/models/Intern/internlm3-8b-instruct) | 2025-01-15 | +| Model | Transformers | ModelScope | Modelers | Release Date | +| ------------------------- | ---------------------------------------------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------- | ------------ | +| **InternLM3-8B-Instruct** | [🤗internlm3_8B_instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [ internlm3_8b_instruct](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct/summary) | [![Open in Modelers](https://modelers.cn/assets/logo1-1bf58310.svg)](https://modelers.cn/models/Intern/internlm3-8b-instruct) | 2025-01-15 | ## Environment Setup @@ -334,7 +334,7 @@ openmind-cli train examples/internlm3/train_sft_full_internlm3.yaml As illustrated in the figure below, the training loss of the openMind Library normally converges, and compared with the GPU, the average relative error is within 2%.
- +

Accuracy Comparison (npu=8, per_device_train_batch_size=6, max_length=1024)

@@ -342,7 +342,7 @@ As illustrated in the figure below, the training loss of the openMind Library no The openMind Library supports the enabling of fine-tuning methods such as LoRA and QLoRA on Ascend NPUs, significantly reducing device memory usage. As illustrated in the figure below, employing the QLoRA fine-tuning method can lead to approximately a 40% reduction in device memory consumption.
- +

Memory Consumption (npu=8, per_device_train_batch_size=6, max_length=1024)

@@ -350,7 +350,7 @@ The openMind Library supports the enabling of fine-tuning methods such as LoRA a The openMind Library facilitates the automatic loading of Ascend NPU fused operators during training, eliminating the need for developers to manually modify code or configurations. This enhances model training performance while maintaining ease of use. The figure below demonstrates the performance benefits achieved by default when the openMind Library enables Ascend NPU fused operators.
- +

Training Samples per Second

diff --git a/ecosystem/README_npu_zh-CN.md b/ecosystem/README_npu_zh-CN.md index 5a1abc7..0e34679 100644 --- a/ecosystem/README_npu_zh-CN.md +++ b/ecosystem/README_npu_zh-CN.md @@ -2,7 +2,7 @@
- +
 
书生·浦语 官网 @@ -14,8 +14,8 @@
 
-[![license](./assets//license.svg)](https://github.com/open-mmlab/mmdetection/blob/main/LICENSE) -[![evaluation](./assets//compass_support.svg)](https://github.com/internLM/OpenCompass/) +[![license](../assets/license.svg)](https://github.com/open-mmlab/mmdetection/blob/main/LICENSE) +[![evaluation](../assets/compass_support.svg)](https://github.com/internLM/OpenCompass/) @@ -43,9 +43,9 @@ ### InternLM3 -| Model | Transformers | ModelScope | Modelers | Release Date | -| ------------------------- | ---------------------------------------------------- | -------------------------------------------------- | ------------------------------------------------- | ------------ | -| **InternLM3-8B-Instruct** | [🤗internlm3_8B_instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [ internlm3_8b_instruct](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct/summary) | [![Open in Modelers](https://modelers.cn/assets/logo1-1bf58310.svg)](https://modelers.cn/models/Intern/internlm3-8b-instruct) | 2025-01-15 | +| Model | Transformers | ModelScope | Modelers | Release Date | +| ------------------------- | ---------------------------------------------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------- | ------------ | +| **InternLM3-8B-Instruct** | [🤗internlm3_8B_instruct](https://huggingface.co/internlm/internlm3-8b-instruct) | [ internlm3_8b_instruct](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct/summary) | [![Open in Modelers](https://modelers.cn/assets/logo1-1bf58310.svg)](https://modelers.cn/models/Intern/internlm3-8b-instruct) | 2025-01-15 | ## 环境准备 @@ -333,7 +333,7 @@ openmind-cli train examples/internlm3/train_sft_full_internlm3.yaml 如下图所示,openMind Library 的训练 loss 正常收敛,同时和 GPU 对比,平均相对误差在 2% 以内。
- +

精度对比 (npu=8, per_device_train_batch_size=6, max_length=1024)

@@ -342,7 +342,7 @@ openMind Library 支持在昇腾 NPU 上使能 LoRA、QLoRA 等微调方法, 如下图所示,通过使能 QloRA 微调方式可减少 device 内存约 40%。
- +

Full/LoRA/QLoRA 显存开销 (npu=8, per_device_train_batch_size=6, max_length=1024)

@@ -351,7 +351,7 @@ openMind Library 支持训练时自动加载昇腾 NPU 融合算子,无需开 的同时兼顾易用性。下图展示了 openMind 默认使能昇腾 NPU 融合算子之后的性能收益。
- +

每秒训练样本数