通义千问2-7B-Instruct-GPTQ-Int3-量化修复
【模型更新日期】
2024-06-08
【模型大小】
4.5GB
【修复内容】
- 对GPTQ Int3量化的校准做了额外优化;减少
int3
模型的1.乱吐字
、2.无限循环
、3.长文能力丢失
等情况。
【更新日志】
2024-06-08
首次commit
【介绍】
Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 7B Qwen2 model.
Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
【同期量化修复模型】
待工作完成后补充…
【模型下载】
from modelscope import snapshot_download
model_dir = snapshot_download('tclf90/模型名', cache_dir="本地路径")
【vLLM推理(目前仅限Linux)】
1. Python 简易调试
待工作完成后补充…
2. 类ChatGPT RESTFul API Server
>>> python -m vllm.entrypoints.openai.api_server --model 本地路径/tclf90/模型名称
评论