From 2ac5b2223d17a1979ed6fc65774e8dbcf79ea690 Mon Sep 17 00:00:00 2001
From: Yifan <yfyang.86@gmail.com>
Date: Wed, 3 May 2023 15:01:26 +0800
Subject: [PATCH] =?UTF-8?q?=20[Document]=20=E6=9B=B4=E6=96=B0Mac=E9=83=A8?=
 =?UTF-8?q?=E7=BD=B2?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

 [Document] 更新Mac部署
- FILE: README.md/README_end.md
- ADD: OPENMP; MPS

# 具体内容

以[chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4)量化模型为例，做如下配置：

- 安装libomp的步骤;
- 对量化后模型等配置gcc编译项；
- 量化后模型启用MPS的解释；
- 缩短文本长度。
---
 README.md    | 31 +++++++------------------------
 README_en.md | 34 ++++++++--------------------------
 2 files changed, 15 insertions(+), 50 deletions(-)

diff --git a/README.md b/README.md
index 2fd5ef6..d5a2d71 100644
--- a/README.md
+++ b/README.md
@@ -193,17 +193,11 @@ model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True
 
 ### Mac 上的 CPU 部署和加速
 
-Mac直接加载量化后的模型会出现问题（可运行但是单核），这是由于Mac由于本身缺乏omp导致的。
+Mac直接加载量化后的模型会出现问题，例如`clang: error: unsupported option '-fopenmp'，这是由于Mac由于本身缺乏omp导致的，此时可运行但是单核。
 
-```sh
-clang: error: unsupported option '-fopenmp'
-clang: error: unsupported option '-fopenmp'
-```
+以[chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4)量化模型为例，需要做如下配置，即可在Mac下使用OMP：
 
-以[chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4)量化模型为例，需要做如下配置：
-
-1. 安装`libomp`;
-2. 配置`gcc`编译项。
+#### 第一步：安装`libomp`
 
 ```bash
 # 第一步: 参考`https://mac.r-project.org/openmp/`
@@ -211,9 +205,10 @@ clang: error: unsupported option '-fopenmp'
 curl -O https://mac.r-project.org/openmp/openmp-14.0.6-darwin20-Release.tar.gz
 sudo tar fvxz openmp-14.0.6-darwin20-Release.tar.gz -C /
 ```
-
 此时会安装下面几个文件：`/usr/local/lib/libomp.dylib`, `/usr/local/include/ompt.h`, `/usr/local/include/omp.h`, `/usr/local/include/omp-tools.h`。
 
+#### 第二步：配置`gcc`编译项
+
 然后针对`chatglm-6b-int4`, 修改[quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/blob/main/quantization.py)，主要是把硬编码的`gcc -O3 -fPIC -pthread -fopenmp -std=c99`命令修改成`gcc -O3 -fPIC -Xclang -fopenmp -pthread  -lomp -std=c99`，[对应代码](https://huggingface.co/THUDM/chatglm-6b-int4/blob/63d66b0572d11cedd5574b38da720299599539b3/quantization.py#L168)见下:
 
 ```python
@@ -221,21 +216,9 @@ sudo tar fvxz openmp-14.0.6-darwin20-Release.tar.gz -C /
 compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread  -lomp -std=c99 {} -shared -o {}".format(source_code, kernel_file)
 ```
 
-为了兼容性，也能写成
-```python
-## 在最开始增加一个包
-import platform
-## ...
-## 上述相应部分修改为（请自行改一下缩进）：
-if platform.uname()[0] == 'Darwin':
-    compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread  -lomp -std=c99 -o {}".format(
-    source_code, kernel_file)
-else:
-    compile_command = "gcc -O3 -fPIC -pthread -fopenmp -std=c99 {} -shared -o {}".format(
-    source_code, kernel_file)
-```
+> 补充说明：可以用`platform.uname()[0] == 'Darwin'`做OS的判断，从而使得[quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/blob/main/quantization.py)有兼容性。
 
-> 注意：如果你之前运行过失败过，最好清一下Huggingface的缓存，i.e. `rm -rf ${HOME}/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4`。由于使用了`rm`命令，请明确知道自己在删除什么。
+> 注意：如果你之前运行`ChatGLM`项目失败过，最好清一下Huggingface的缓存，i.e. 默认下是 `rm -rf ${HOME}/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4`。由于使用了`rm`命令，请明确知道自己在删除什么。
 
 ### Mac 上的 GPU 加速
 对于搭载了Apple Silicon的Mac（以及MacBook），可以使用 MPS 后端来在 GPU 上运行 ChatGLM-6B。需要参考 Apple 的 [官方说明](https://developer.apple.com/metal/pytorch) 安装 PyTorch-Nightly。
diff --git a/README_en.md b/README_en.md
index f63bc01..d2b6d68 100644
--- a/README_en.md
+++ b/README_en.md
@@ -191,26 +191,21 @@ If your encounter the error `Could not find module 'nvcuda.dll'` or `RuntimeErro
 
 ### CPU Deployment on Mac
 
-The default Mac enviroment does not support Openmp. One may encounter such warning/errors when execute the `AutoModel.from_pretrained(...)` command:
+The default Mac enviroment does not support Openmp. One may encounter such warning/errors when execute the `AutoModel.from_pretrained(...)` command `clang: error: unsupported option '-fopenmp'`.
 
-```sh
-clang: error: unsupported option '-fopenmp'
-clang: error: unsupported option '-fopenmp'
-```
+Take the quantified int4 version [chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4) for example, two extra steps are needed.
 
-Take the quantified int4 version [chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4) for example, the following extra steps are needed:
-
-#### Install `libomp`
+#### STEP 1: Install `libomp`
 
 ```bash
-# STEP 1: install libopenmp, check `https://mac.r-project.org/openmp/` for details
-## Assumption: `gcc(clang) >= 14.x`, read the R-Poject before run the code:
+# STEP 1: install libopenmp, check `https://mac.r-project.org/openmp/` for details.
+# Assumption: `gcc(clang) >= 14.x`, read the R-Poject before run the code:
 curl -O https://mac.r-project.org/openmp/openmp-14.0.6-darwin20-Release.tar.gz
 sudo tar fvxz openmp-14.0.6-darwin20-Release.tar.gz -C /
 ```
 Four files (`/usr/local/lib/libomp.dylib`, `/usr/local/include/ompt.h`, `/usr/local/include/omp.h`, `/usr/local/include/omp-tools.h`) are installed accordingly.
 
-#### Configure `gcc` with `-fopenmp`
+#### STEP 2: Configure `gcc` with `-fopenmp`
 
 Next, modify the [quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/blob/main/quantization.py) file of the `chatglm-6b-int4` project. In the file, change the `gcc -O3 -fPIC -pthread -fopenmp -std=c99` configuration to `gcc -O3 -fPIC -Xclang -fopenmp -pthread  -lomp -std=c99` (check the corresponding python code [HERE](https://huggingface.co/THUDM/chatglm-6b-int4/blob/63d66b0572d11cedd5574b38da720299599539b3/quantization.py#L168)), i.e.:
 
@@ -219,22 +214,9 @@ Next, modify the [quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/
 compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread  -lomp -std=c99 {} -shared -o {}".format(source_code, kernel_file)
 ```
 
-For production code, one could use `platform` library to make it neater:
+> Notice: `platform.uname()[0] == 'Darwin'` could be used to determine the OS type and further polish the python script.
 
-```python
-## import platform just after `import os`
-import platform
-## ...
-## change the corresponding lines to:
-if platform.uname()[0] == 'Darwin':
-    compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread  -lomp -std=c99 -o {}".format(
-    source_code, kernel_file)
-else:
-    compile_command = "gcc -O3 -fPIC -pthread -fopenmp -std=c99 {} -shared -o {}".format(
-    source_code, kernel_file)
-```
-
-> Notice: If you have run the `ChatGLM` project and failed, you may want to clean the cache of Huggingface before your next try, i.e. `rm -rf ${HOME}/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4`. Since `rm` is used, please MAKE SURE that the command deletes the right files.
+> Notice: If you have executed the `ChatGLM` project and failed, you may want to clean the cache of Huggingface before your next try, i.e. `rm -rf ${HOME}/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4`. Since `rm` is used, please MAKE SURE that the command deletes the right files.
 
 ### GPU Inference on Mac
 For Macs (and MacBooks) with Apple Silicon, it is possible to use the MPS backend to run ChatGLM-6B on the GPU. First, you need to refer to Apple's [official instructions](https://developer.apple.com/metal/pytorch) to install PyTorch-Nightly.