We opensource gptq format of Int4-quantized AquilaChat2-34B model, which can be used for quick downloading and usage.
我们开源了用gptq做4比特量化的AquilaChat2-34B 模型,可以更快的下载和使用。
Quick Start 快速上手 AquilaChat2-34B-Int4-GPTQ
1. Environment setup
Follow the instructions in https://github.com/PanQiWei/AutoGPTQ/tree/main#quick-installation to install Auto-GPTQ
按照https://github.com/PanQiWei/AutoGPTQ/tree/main#quick-installation里的指示安装Auto-GPTQ
2. Inference 模型推理
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
# pretrained_model_dir = "/share/project/ldwang/checkpoints/Aquila-33b-knowledge6-341000-sft-v0.9.16/iter_0004000_hf"
model_dir = "./checkpoints/Aquilachat34b-4bit" # 模型路径
device="cuda:0"
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True,trust_remote_code=True)
model = AutoGPTQForCausalLM.from_quantized(model_dir, inject_fused_attention=False, low_cpu_mem_usage=True, device=device)
model.eval()
import time
texts = ["请给出10个要到北京旅游的理由。",
"写一个林黛玉倒拔垂杨柳的故事",
"write a poet about moon"]
from predict import predict
start_time = time.time()
for text in texts:
out = predict(model, text, tokenizer=tokenizer, max_gen_len=200, top_p=0.95,
seed=1234, topk=200, temperature=1.0, sft=True, device=device,
model_name="AquilaChat2-34B")
print(out)
print(f"Elapsed time model loading: {time.time()-start_time} seconds")
License
Aquila2 series open-source model is licensed under BAAI Aquila Model Licence Agreement
评论