This repo cotais GGUF format model files for Llama-2-7b-Chat. GGUF is a ew format itroduced by the llama.cpp team o August 21st 2023.
It is a replacemet for GGML, which is o loger supported by llama.cpp. Supported quatizatio methods: Will add more methods i the future, you ca cotact us if support for other quatificatio is eeded. If you wat to ru with GPU acceleratio, refer to istallatio. Xiferece Replace OpeAI GPT with aother LLM i your app
by chagig a sigle lie of code. Xiferece gives you the freedom to use ay LLM you eed.
With Xiferece, you are empowered to ru iferece with ay ope-source laguage models,
speech recogitio models, ad multimodal models, whether i the cloud, o-premises, or eve o your laptop.Llama-2-7b-Chat-GGUF
About GGUF
Example code
Istall packages
pip istall xiferece[ggml]>=0.4.3
Start a local istace of Xiferece
xiferece -p 9997
Lauch ad iferece
from xiferece.cliet import Cliet
cliet = Cliet("http://localhost:9997")
model_uid = cliet.lauch_model(
model_ame="llama-2-chat",
model_format="ggufv2",
model_size_i_billios=7,
quatizatio="Q4_K_M",
)
model = cliet.get_model(model_uid)
chat_history = []
prompt = "What is the largest aimal?"
model.chat(
prompt,
chat_history=chat_history,
geerate_cofig={"max_tokes": 1024}
)
More iformatio
点击空白处退出提示
评论