mirror of https://github.com/THUDM/ChatGLM2-6B
				
				
				
			
						commit
						738e015cac
					
				|  | @ -33,6 +33,7 @@ ChatGLM2-6B 开源模型旨在与开源社区一起推动大模型技术发展 | |||
| 对 ChatGLM2 进行加速的开源项目: | ||||
| * [fastllm](https://github.com/ztxz16/fastllm/): 全平台加速推理方案,单GPU批量推理每秒可达10000+token,手机端最低3G内存实时运行(骁龙865上约4~5 token/s) | ||||
| * [chatglm.cpp](https://github.com/li-plus/chatglm.cpp): 类似 llama.cpp 的 CPU 量化加速推理方案,实现 Mac 笔记本上实时对话 | ||||
| * [ChatGLM2-TPU](https://github.com/sophgo/ChatGLM2-TPU): 采用TPU加速推理方案,在算能端侧芯片BM1684X(16T@FP16,内存16G)上实时运行约3 token/s | ||||
| 
 | ||||
| 支持 ChatGLM-6B 和相关应用在线训练的示例项目: | ||||
| * [ChatGLM2-6B 的部署与微调教程](https://www.heywhale.com/mw/project/64984a7b72ebe240516ae79c) | ||||
|  |  | |||
|  | @ -24,7 +24,10 @@ Although the model strives to ensure the compliance and accuracy of data at each | |||
| 
 | ||||
| ## Projects | ||||
| Open source projects that accelerate ChatGLM2: | ||||
| 
 | ||||
| * [fastllm](https://github.com/ztxz16/fastllm/): Universal platform acceleration inference solution, single GPU batch inference can reach 10,000+ tokens per second, and it can run in real-time on mobile devices with a minimum of 3GB of memory (about 4~5 tokens/s on Snapdragon 865). | ||||
| * [chatglm.cpp](https://github.com/li-plus/chatglm.cpp): Real-time CPU inference on a MacBook accelerated by quantization, similar to llama.cpp. | ||||
| * [ChatGLM2-TPU](https://github.com/sophgo/ChatGLM2-TPU): Using the TPU accelerated inference solution, it runs about 3 token/s in real time on the end-side chip BM1684X (16T@FP16, 16G DDR). | ||||
| 
 | ||||
| Example projects supporting online training of ChatGLM-6B and related applications: | ||||
| * [ChatGLM-6B deployment and fine-tuning tutorial](https://www.heywhale.com/mw/project/64984a7b72ebe240516ae79c) | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	 Zhengxiao Du
						Zhengxiao Du