glm-4-9b-chat-1m-GGUF

我要开发同款
匿名用户2024年07月31日
53阅读
所属分类aipytorch、2bit、3bit、4bit、5bit、6bit、8bit、16bit、static
开源地址https://modelscope.cn/models/LLM-Research/glm-4-9b-chat-1m-GGUF
授权协议other

作品详情

glm-4-9b-chat-1m-GGUF

Llama.cpp static quantization of THUDM/glm-4-9b-chat-1m

Original Model: THUDM/glm-4-9b-chat-1m
Original dtype: BF16 (bfloat16)
Quantized by: https://github.com/ggerganov/llama.cpp/pull/6999
IMatrix dataset: here

Files

Common Quants

Filename Quant type File Size Status Uses IMatrix Is Split
glm-4-9b-chat-1m.Q8_0.gguf Q8_0 10.08GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q6_K.gguf Q6_K 8.33GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q4_K.gguf Q4_K 6.31GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q3_K.gguf Q3_K 5.11GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q2_K.gguf Q2_K 4.02GB ✅ Available ⚪ Static ? No

All Quants

Filename Quant type File Size Status Uses IMatrix Is Split
glm-4-9b-chat-1m.BF16.gguf BF16 18.97GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.FP16.gguf F16 18.97GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q8_0.gguf Q8_0 10.08GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q6_K.gguf Q6_K 8.33GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q5_K.gguf Q5_K 7.21GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q5KS.gguf Q5KS 6.75GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q4_K.gguf Q4_K 6.31GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q4KS.gguf Q4KS 5.80GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.IQ4_NL.gguf IQ4_NL 5.56GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.IQ4_XS.gguf IQ4_XS 5.35GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q3_K.gguf Q3_K 5.11GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q3KL.gguf Q3KL 5.33GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q3KS.gguf Q3KS 4.62GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.IQ3_M.gguf IQ3_M 4.86GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.IQ3_S.gguf IQ3_S 4.62GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.IQ3_XS.gguf IQ3_XS 4.47GB ✅ Available ⚪ Static ? No
glm-4-9b-chat-1m.Q2_K.gguf Q2_K 4.02GB ✅ Available ⚪ Static ? No

Downloading using huggingface-cli

If you do not have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Download the specific file you want:

huggingface-cli download legraphista/glm-4-9b-chat-1m-GGUF --include "glm-4-9b-chat-1m.Q8_0.gguf" --local-dir ./

If the model file is big, it has been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download legraphista/glm-4-9b-chat-1m-GGUF --include "glm-4-9b-chat-1m.Q8_0/*" --local-dir ./
# see FAQ for merging GGUF's

Inference

Simple chat template

[gMASK]<sop><|user|>
{user_prompt}<|assistant|>
{assistant_response}<|user|>
{next_user_prompt}

Chat template with system prompt

[gMASK]<sop><|system|>
{system_prompt}<|user|>
{user_prompt}<|assistant|>
{assistant_response}<|user|>
{next_user_prompt}

Llama.cpp

llama.cpp/main -m glm-4-9b-chat-1m.Q8_0.gguf --color -i -p "prompt here (according to the chat template)"

FAQ

Why is the IMatrix not applied everywhere?

According to this investigation, it appears that lower quantizations are the only ones that benefit from the imatrix input (as per hellaswag results).

How do I merge a split GGUF?

  1. Make sure you have gguf-split available
    • To get hold of gguf-split, navigate to https://github.com/ggerganov/llama.cpp/releases
    • Download the appropriate zip for your system from the latest release
    • Unzip the archive and you should be able to find gguf-split
  2. Locate your GGUF chunks folder (ex: glm-4-9b-chat-1m.Q8_0)
  3. Run gguf-split --merge glm-4-9b-chat-1m.Q8_0/glm-4-9b-chat-1m.Q8_0-00001-of-XXXXX.gguf glm-4-9b-chat-1m.Q8_0.gguf
    • Make sure to point gguf-split to the first chunk of the split.

Got a suggestion? Ping me @legraphista!

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论