AquilaChat2-34B-Int4-GPTQ

我要开发同款
匿名用户2024年07月31日
11阅读
所属分类ai、llama
开源地址https://modelscope.cn/models/BAAI/AquilaChat2-34B-Int4-GPTQ
授权协议other

作品详情

GithubWeChat

We opensource gptq format of Int4-quantized AquilaChat2-34B model, which can be used for quick downloading and usage.

我们开源了用gptq做4比特量化的AquilaChat2-34B 模型,可以更快的下载和使用。

Quick Start 快速上手 AquilaChat2-34B-Int4-GPTQ

1. Environment setup

Follow the instructions in https://github.com/PanQiWei/AutoGPTQ/tree/main#quick-installation to install Auto-GPTQ

按照https://github.com/PanQiWei/AutoGPTQ/tree/main#quick-installation里的指示安装Auto-GPTQ

2. Inference 模型推理

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM


# pretrained_model_dir = "/share/project/ldwang/checkpoints/Aquila-33b-knowledge6-341000-sft-v0.9.16/iter_0004000_hf"
model_dir = "./checkpoints/Aquilachat34b-4bit" # 模型路径
device="cuda:0"

tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True,trust_remote_code=True)
model = AutoGPTQForCausalLM.from_quantized(model_dir, inject_fused_attention=False, low_cpu_mem_usage=True, device=device)


model.eval()
import time 
texts = ["请给出10个要到北京旅游的理由。",
         "写一个林黛玉倒拔垂杨柳的故事",
         "write a poet about moon"]
from predict import predict
start_time = time.time()
for text in texts:
    out = predict(model, text, tokenizer=tokenizer, max_gen_len=200, top_p=0.95,
                seed=1234, topk=200, temperature=1.0, sft=True, device=device,
                model_name="AquilaChat2-34B")
print(out)
print(f"Elapsed time model loading: {time.time()-start_time} seconds")

License

Aquila2 series open-source model is licensed under BAAI Aquila Model Licence Agreement

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论