mirror of https://github.com/hpcaitech/ColossalAI
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
CjhHa1
bc9063adf1
|
7 months ago | |
---|---|---|
.. | ||
README.md | 7 months ago | |
__init__.py | ||
api_server.py | 7 months ago | |
chat_service.py | 7 months ago | |
completion_service.py | 7 months ago | |
utils.py | 7 months ago |
README.md
Online Service
Colossal-Inference supports fast-api based online service. Simple completion and chat are both supported. Follow the commands below and
you can simply construct a server with both completion and chat functionalities. For now we only support Llama
model, we will fullfill
the blank quickly.
Usage
# First, Lauch an API locally.
python3 -m colossalai.inference.server.api_server --model path of your llama2 model --chat_template "{% for message in messages %}
{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}"
# Second, you can turn to the page `http://127.0.0.1:8000/docs` to check the api
# For completion service, you can invoke it
curl -X POST http://127.0.0.1:8000/completion -H 'Content-Type: application/json' -d '{"prompt":"hello, who are you? ","stream":"False"}'
# For chat service, you can invoke it
curl -X POST http://127.0.0.1:8000/completion -H 'Content-Type: application/json' -d '{"converation":
[{"role": "system", "content": "you are a helpful assistant"},
{"role": "user", "content": "what is 1+1?"},],
"stream": "False",}'
# If you just want to test a simple generation, turn to generate api
curl -X POST http://127.0.0.1:8000/generate -H 'Content-Type: application/json' -d '{"prompt":"hello, who are you? ","stream":"False"}'
We also support streaming output, simply change the stream
to True
in the request body.