mirror of https://github.com/hpcaitech/ColossalAI
main
support-npu
feature/zerobubble
feature/async-io
pre-commit-ci-update-config
ckpt
supercooledith-patch-1
flybird11111-patch-1
ColossalChat
colossalchat
moe_sp
dev/zero-offload
colossalchat_upgrade
fix-setup
feature/colossal-infer
fix/format
feat/online-serving
feature/lora
llama3
feat/speculative-decoding
hotfix/kernel_build_before_load
feat/moe
refactor/inference
feature/inference-refactor
hotfix/example_test
cloud/coati
feature/2-stage
feature/stable-diffusion
develop
feature/elixir
dev/gpt2_metainfo_patch
v0.0.1-beta
v0.0.2
v0.1.0
v0.1.1
v0.1.10
v0.1.11rc1
v0.1.11rc2
v0.1.11rc3
v0.1.11rc4
v0.1.11rc5
v0.1.12
v0.1.13
v0.1.2
v0.1.3
v0.1.4
v0.1.5
v0.1.6
v0.1.7
v0.1.8
v0.1.9
v0.2.0
v0.2.1
v0.2.2
v0.2.3
v0.2.4
v0.2.5
v0.2.6
v0.2.7
v0.2.8
v0.3.0
v0.3.1
v0.3.2
v0.3.3
v0.3.4
v0.3.5
v0.3.6
v0.3.7
v0.3.8
v0.3.9
v0.4.0
v0.4.1
v0.4.2
v0.4.3
v0.4.4
v0.4.5
v0.4.6
${ noResults }
1 Commits (9110406a47e6d07573706a56b9a0bc466c6fb328)
Author | SHA1 | Message | Date |
---|---|---|---|
Xu Kai |
611a5a80ca
|
[inference] Add smmoothquant for llama (#4904)
* [inference] add int8 rotary embedding kernel for smoothquant (#4843) * [inference] add smoothquant llama attention (#4850) * add smoothquant llama attention * remove uselss code * remove useless code * fix import error * rename file name * [inference] add silu linear fusion for smoothquant llama mlp (#4853) * add silu linear * update skip condition * catch smoothquant cuda lib exception * prcocess exception for tests * [inference] add llama mlp for smoothquant (#4854) * add llama mlp for smoothquant * fix down out scale * remove duplicate lines * add llama mlp check * delete useless code * [inference] add smoothquant llama (#4861) * add smoothquant llama * fix attention accuracy * fix accuracy * add kv cache and save pretrained * refactor example * delete smooth * refactor code * [inference] add smooth function and delete useless code for smoothquant (#4895) * add smooth function and delete useless code * update datasets * remove duplicate import * delete useless file * refactor codes (#4902) * rafactor code * add license * add torch-int and smoothquant license |
1 year ago |