小红书文本

我要开发同款
匿名用户2024年07月31日
53阅读

技术信息

开源地址
https://modelscope.cn/models/aitejiu/xhs_createation
授权协议
Apache License 2.0

作品详情

INT4 Weight-oly Quatizatio ad Deploymet (W4A16)

LMDeploy adopts AWQ algorithm for 4bit weight-oly quatizatio. By developed the high-performace cuda kerel, the 4bit quatized model iferece achieves up to 2.4x faster tha FP16.

LMDeploy supports the followig NVIDIA GPU for W4A16 iferece:

  • Turig(sm75): 20 series, T4

  • Ampere(sm80,sm86): 30 series, A10, A16, A30, A100

  • Ada Lovelace(sm90): 40 series

Before proceedig with the quatizatio ad iferece, please esure that lmdeploy is istalled.

pip istall lmdeploy[all]

This article comprises the followig sectios:

Iferece

Please dowload iterlm2-chat-20b-4bit model as follows,

git-lfs istall
git cloe --depth=1 https://www.modelscope.c/Shaghai_AI_Laboratory/iterlm2-chat-20b-4bits.git

Tryig the followig codes, you ca perform the batched offlie iferece with the quatized model:

from lmdeploy import pipelie, TurbomidEgieCofig
egie_cofig = TurbomidEgieCofig(model_format='awq')
pipe = pipelie("./iterlm2-chat-20b-4bits", backed_cofig=egie_cofig)
respose = pipe(["Hi, pls itro yourself", "Shaghai is"])
prit(respose)

For more iformatio about the pipelie parameters, please refer to here.

Evaluatio

Please overview this guide about model evaluatio with LMDeploy.

Service

LMDeploy's api_server eables models to be easily packed ito services with a sigle commad. The provided RESTful APIs are compatible with OpeAI's iterfaces. Below are a example of service startup:

lmdeploy serve api_server ./iterlm2-chat-20b-4bits --backed turbomid --model-format awq

The default port of api_server is 23333. After the server is lauched, you ca commuicate with server o termial through api_cliet:

lmdeploy serve api_cliet http://0.0.0.0:23333

You ca overview ad try out api_server APIs olie by swagger UI at http://0.0.0.0:23333, or you ca also read the API specificatio from here.

功能介绍

INT4 Weight-only Quantization and Deployment (W4A16) LMDeploy adopts AWQ algorithm for 4bit weight-o

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论