We opesource gptq format of It4-quatized 我们开源了用gptq做4比特量化的 Follow the istructios i https://github.com/PaQiWei/AutoGPTQ/tree/mai#quick-istallatio to istall Auto-GPTQ 按照https://github.com/PaQiWei/AutoGPTQ/tree/mai#quick-istallatio里的指示安装Auto-GPTQ Aquila2 series ope-source model is licesed uder BAAI Aquila Model Licece AgreemetQuick Start 快速上手 AquilaChat2-34B-It4-GPTQ
1. Eviromet setup
2. Iferece 模型推理
from trasformers import AutoTokeizer
from auto_gptq import AutoGPTQForCausalLM
# pretraied_model_dir = "/share/project/ldwag/checkpoits/Aquila-33b-kowledge6-341000-sft-v0.9.16/iter_0004000_hf"
model_dir = "./checkpoits/Aquilachat34b-4bit" # 模型路径
device="cuda:0"
tokeizer = AutoTokeizer.from_pretraied(model_dir, use_fast=True,trust_remote_code=True)
model = AutoGPTQForCausalLM.from_quatized(model_dir, iject_fused_attetio=False, low_cpu_mem_usage=True, device=device)
model.eval()
import time
texts = ["请给出10个要到北京旅游的理由。",
"写一个林黛玉倒拔垂杨柳的故事",
"write a poet about moo"]
from predict import predict
start_time = time.time()
for text i texts:
out = predict(model, text, tokeizer=tokeizer, max_ge_le=200, top_p=0.95,
seed=1234, topk=200, temperature=1.0, sft=True, device=device,
model_ame="AquilaChat2-34B")
prit(out)
prit(f"Elapsed time model loadig: {time.time()-start_time} secods")
Licese
点击空白处退出提示
评论