mirror of https://github.com/hpcaitech/ColossalAI
Tree:
e57812c672
ColossalChat
ckpt
cloud/coati
colossalchat
colossalchat_upgrade
dev/gpt2_metainfo_patch
dev/zero-offload
develop
feat/moe
feat/online-serving
feat/speculative-decoding
feature/2-stage
feature/async-io
feature/colossal-infer
feature/elixir
feature/inference-refactor
feature/lora
feature/stable-diffusion
feature/zerobubble
fix-setup
fix/format
flybird11111-patch-1
hotfix/example_test
hotfix/kernel_build_before_load
llama3
main
moe_sp
pre-commit-ci-update-config
refactor/inference
supercooledith-patch-1
support-npu
v0.0.1-beta
v0.0.2
v0.1.0
v0.1.1
v0.1.10
v0.1.11rc1
v0.1.11rc2
v0.1.11rc3
v0.1.11rc4
v0.1.11rc5
v0.1.12
v0.1.13
v0.1.2
v0.1.3
v0.1.4
v0.1.5
v0.1.6
v0.1.7
v0.1.8
v0.1.9
v0.2.0
v0.2.1
v0.2.2
v0.2.3
v0.2.4
v0.2.5
v0.2.6
v0.2.7
v0.2.8
v0.3.0
v0.3.1
v0.3.2
v0.3.3
v0.3.4
v0.3.5
v0.3.6
v0.3.7
v0.3.8
v0.3.9
v0.4.0
v0.4.1
v0.4.2
v0.4.3
v0.4.4
v0.4.5
v0.4.6
${ noResults }
1 Commits (e57812c6727e325971cb0d8769c0789c088f62ae)
Author | SHA1 | Message | Date |
---|---|---|---|
Xu Kai |
611a5a80ca
|
[inference] Add smmoothquant for llama (#4904)
* [inference] add int8 rotary embedding kernel for smoothquant (#4843) * [inference] add smoothquant llama attention (#4850) * add smoothquant llama attention * remove uselss code * remove useless code * fix import error * rename file name * [inference] add silu linear fusion for smoothquant llama mlp (#4853) * add silu linear * update skip condition * catch smoothquant cuda lib exception * prcocess exception for tests * [inference] add llama mlp for smoothquant (#4854) * add llama mlp for smoothquant * fix down out scale * remove duplicate lines * add llama mlp check * delete useless code * [inference] add smoothquant llama (#4861) * add smoothquant llama * fix attention accuracy * fix accuracy * add kv cache and save pretrained * refactor example * delete smooth * refactor code * [inference] add smooth function and delete useless code for smoothquant (#4895) * add smooth function and delete useless code * update datasets * remove duplicate import * delete useless file * refactor codes (#4902) * rafactor code * add license * add torch-int and smoothquant license |
1 year ago |