Llama-3-Instruct-8B-SPPO-Iter3-GGUF_开源AI项目-程序员客栈

开源地址
https://modelscope.cn/models/AI-ModelScope/Llama-3-Instruct-8B-SPPO-Iter3-GGUF授权协议
apache-2.0

Llamacpp imatrix Quatizatios of Llama-3-Istruct-8B-SPPO-Iter3

Usig llama.cpp release b3197 for quatizatio.

Origial model: https://huggigface.co/UCLA-AGI/Llama-3-Istruct-8B-SPPO-Iter3

All quats made usig imatrix optio with dataset from here

Prompt format

<|begi_of_text|><|start_header_id|>system<|ed_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|ed_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistat<|ed_header_id|>

Dowload a file (ot the whole brach) from below:

Fileame	Quat type	File Size	Descriptio
Llama-3-Istruct-8B-SPPO-Iter3-Q80L.gguf	Q80L	9.52GB	Experimetal, uses f16 for embed ad output weights. Please provide ay feedback of differeces. Extremely high quality, geerally ueeded but max available quat.
Llama-3-Istruct-8B-SPPO-Iter3-Q8_0.gguf	Q8_0	8.54GB	Extremely high quality, geerally ueeded but max available quat.
Llama-3-Istruct-8B-SPPO-Iter3-Q6KL.gguf	Q6KL	7.83GB	Experimetal, uses f16 for embed ad output weights. Please provide ay feedback of differeces. Very high quality, ear perfect, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q6_K.gguf	Q6_K	6.59GB	Very high quality, ear perfect, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q5KL.gguf	Q5KL	7.04GB	Experimetal, uses f16 for embed ad output weights. Please provide ay feedback of differeces. High quality, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q5KM.gguf	Q5KM	5.73GB	High quality, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q5KS.gguf	Q5KS	5.59GB	High quality, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q4KL.gguf	Q4KL	6.29GB	Experimetal, uses f16 for embed ad output weights. Please provide ay feedback of differeces. Good quality, uses about 4.83 bits per weight, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q4KM.gguf	Q4KM	4.92GB	Good quality, uses about 4.83 bits per weight, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q4KS.gguf	Q4KS	4.69GB	Slightly lower quality with more space savigs, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-IQ4_XS.gguf	IQ4_XS	4.44GB	Decet quality, smaller tha Q4KS with similar performace, recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-Q3KXL.gguf	Q3KXL		Experimetal, uses f16 for embed ad output weights. Please provide ay feedback of differeces. Lower quality but usable, good for low RAM availability.
Llama-3-Istruct-8B-SPPO-Iter3-Q3KL.gguf	Q3KL	4.32GB	Lower quality but usable, good for low RAM availability.
Llama-3-Istruct-8B-SPPO-Iter3-Q3KM.gguf	Q3KM	4.01GB	Eve lower quality.
Llama-3-Istruct-8B-SPPO-Iter3-IQ3_M.gguf	IQ3_M	3.78GB	Medium-low quality, ew method with decet performace comparable to Q3KM.
Llama-3-Istruct-8B-SPPO-Iter3-Q3KS.gguf	Q3KS	3.66GB	Low quality, ot recommeded.
Llama-3-Istruct-8B-SPPO-Iter3-IQ3_XS.gguf	IQ3_XS	3.51GB	Lower quality, ew method with decet performace, slightly better tha Q3KS.
Llama-3-Istruct-8B-SPPO-Iter3-IQ3_XXS.gguf	IQ3_XXS	3.27GB	Lower quality, ew method with decet performace, comparable to Q3 quats.
Llama-3-Istruct-8B-SPPO-Iter3-Q2_K.gguf	Q2_K	3.17GB	Very low quality but surprisigly usable.
Llama-3-Istruct-8B-SPPO-Iter3-IQ2_M.gguf	IQ2_M	2.94GB	Very low quality, uses SOTA techiques to also be surprisigly usable.
Llama-3-Istruct-8B-SPPO-Iter3-IQ2_S.gguf	IQ2_S	2.75GB	Very low quality, uses SOTA techiques to be usable.
Llama-3-Istruct-8B-SPPO-Iter3-IQ2_XS.gguf	IQ2_XS	2.60GB	Very low quality, uses SOTA techiques to be usable.

Dowloadig usig huggigface-cli

First, make sure you have huggiface-cli istalled:

pip istall -U "huggigface_hub[cli]"

The, you ca target the specific file you wat:

huggigface-cli dowload bartowski/Llama-3-Istruct-8B-SPPO-Iter3-GGUF --iclude "Llama-3-Istruct-8B-SPPO-Iter3-Q4_K_M.gguf" --local-dir ./

If the model is bigger tha 50GB, it will have bee split ito multiple files. I order to dowload them all to a local folder, ru:

huggigface-cli dowload bartowski/Llama-3-Istruct-8B-SPPO-Iter3-GGUF --iclude "Llama-3-Istruct-8B-SPPO-Iter3-Q8_0.gguf/*" --local-dir Llama-3-Istruct-8B-SPPO-Iter3-Q8_0

You ca either specify a ew local-dir (Llama-3-Istruct-8B-SPPO-Iter3-Q8_0) or dowload them all i place (./)

Which file should I choose?

A great write up with charts showig various performaces is provided by Artefact2 here

The first thig to figure out is how big a model you ca ru. To do this, you'll eed to figure out how much RAM ad/or VRAM you have.

If you wat your model ruig as FAST as possible, you'll wat to fit the whole thig o your GPU's VRAM. Aim for a quat with a file size 1-2GB smaller tha your GPU's total VRAM.

If you wat the absolute maximum quality, add both your system RAM ad your GPU's VRAM together, the similarly grab a quat with a file size 1-2GB Smaller tha that total.

Next, you'll eed to decide if you wat to use a 'I-quat' or a 'K-quat'.

If you do't wat to thik too much, grab oe of the K-quats. These are i format 'QXKX', like Q5KM.

If you wat to get more ito the weeds, you ca check out this extremely useful feature chart:

llama.cpp feature matrix

But basically, if you're aimig for below Q4, ad you're ruig cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quats. These are i format IQXX, like IQ3M. These are ewer ad offer better performace for their size.

These I-quats ca also be used o CPU ad Apple Metal, but will be slower tha their K-quat equivalet, so speed vs performace is a tradeoff you'll have to decide.

The I-quats are ot compatible with Vulca, which is also AMD, so if you have a AMD card double check if you're usig the rocBLAS build or the Vulca build. At the time of writig this, LM Studio has a preview with ROCm support, ad other iferece egies have specific builds for ROCm.

Wat to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

Llamacpp imatrix Quantizations of Llama-3-Instruct-8B-SPPO-Iter3 Using llama.cpp release b3197 for q

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

Llama-3-Instruct-8B-SPPO-Iter3-GGUF

技术信息

作品详情

Llamacpp imatrix Quatizatios of Llama-3-Istruct-8B-SPPO-Iter3

Prompt format

Dowload a file (ot the whole brach) from below:

Dowloadig usig huggigface-cli

Which file should I choose?

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐