Commit Graph

21 Commits (576a2f7b10711bcdb43b86da6a5afaa98f4ad867)

Author SHA1 Message Date
アマデウス 4e4a10c97d
updated c++17 compiler flags (#4983) 2023-10-27 18:19:56 +08:00
Xu Kai 611a5a80ca
[inference] Add smmoothquant for llama (#4904)
* [inference] add int8 rotary embedding kernel for smoothquant (#4843)

* [inference] add smoothquant llama attention (#4850)

* add smoothquant llama attention

* remove uselss code

* remove useless code

* fix import error

* rename file name

* [inference] add silu linear fusion for smoothquant llama mlp  (#4853)

* add silu linear

* update skip condition

* catch smoothquant cuda lib exception

* prcocess exception for tests

* [inference] add llama mlp for smoothquant (#4854)

* add llama mlp for smoothquant

* fix down out scale

* remove duplicate lines

* add llama mlp check

* delete useless code

* [inference] add smoothquant llama (#4861)

* add smoothquant llama

* fix attention accuracy

* fix accuracy

* add kv cache and save pretrained

* refactor example

* delete smooth

* refactor code

* [inference] add smooth function and delete useless code for smoothquant (#4895)

* add smooth function and delete useless code

* update datasets

* remove duplicate import

* delete useless file

* refactor codes (#4902)

* rafactor code

* add license

* add torch-int and smoothquant license
2023-10-16 11:28:44 +08:00
Tong Li bbbcac26e8
fix format (#4815) 2023-09-27 12:50:22 +08:00
Xu Kai 946ab56c48
[feature] add gptq for inference (#4754)
* [gptq] add gptq kernel (#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import
2023-09-22 11:02:50 +08:00
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Mashiro cfa607080f
[Fix] Fix compile error (#4357) 2023-09-01 18:12:58 +08:00
Yanming W 269150b6f4
[Docker] Fix a couple of build issues (#3691) 2023-05-24 10:22:51 +08:00
digger-yu b7141c36dd
[CI] fix some spelling errors (#3707)
* fix spelling error with examples/comminity/

* fix spelling error with tests/

* fix some spelling error with tests/ colossalai/ etc.
2023-05-10 17:12:03 +08:00
digger-yu 7570d9ae3d
[doc] fix op_builder/README.md (#3597)
Optimization Code
change "requries" to "requires"
2023-04-19 15:56:01 +08:00
digger-yu d96567bb5d
[misc] op_builder/builder.py (#3593)
Optimization Code
The source code has not been modified, only a few spelling errors in the comments have been changed
2023-04-18 19:14:59 +08:00
Hongxin Liu 173dad0562
[misc] add verbose arg for zero and op builder (#3552)
* [misc] add print verbose

* [gemini] add print verbose

* [zero] add print verbose for low level

* [misc] add print verbose for op builder
2023-04-17 11:25:35 +08:00
digger-yu 77efdfe1dd
[doc] Update README.md (#3549)
Format Optimization ,Add [] outside of DeepSpeed
2023-04-13 17:11:55 +08:00
digger-yu 3f760da9f0
Update README.md (#3548)
Delete more ")"
2023-04-13 16:49:57 +08:00
ver217 823f3b9cf4
[doc] add deepspeed citation and copyright (#2996)
* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright
2023-03-04 20:08:11 +08:00
Yasyf Mohamedali 19fa0e57f6
Remove extraneous comma (#2993)
Prevents `TypeError: category must be a Warning subclass, not 'str'`.
2023-03-04 14:44:06 +08:00
Frank Lee 3a5d93bc2c
[kernel] cached the op kernel and fixed version check (#2886)
* [kernel] cached the op kernel and fixed version check

* polish code

* polish code
2023-03-03 21:45:05 +08:00
Frank Lee dd14783f75
[kernel] fixed repeated loading of kernels (#2549)
* [kernel] fixed repeated loading of kernels

* polish code

* polish code
2023-02-03 09:47:13 +08:00
Frank Lee 40d376c566
[setup] support pre-build and jit-build of cuda kernels (#2374)
* [setup] support pre-build and jit-build of cuda kernels

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-06 20:50:26 +08:00
Frank Lee 8711310cda
[setup] remove torch dependency (#2333) 2023-01-05 13:53:28 +08:00
Jiarui Fang db6eea3583
[builder] reconfig op_builder for pypi install (#2314) 2023-01-04 16:32:32 +08:00
Frank Lee 9b765e7a69
[setup] removed the build dependency on colossalai (#2307) 2023-01-04 11:38:42 +08:00