mixtral-7b-8expert

我要开发同款
匿名用户2024年07月31日
20阅读
所属分类ai、mistral、Pytorch
开源地址https://modelscope.cn/models/AI-ModelScope/mixtral-7b-8expert
授权协议apache-2.0

作品详情

Mixtral 7b 8 Expert

image/png

This is a preliminary HuggingFace implementation of the newly released MoE model by MistralAi. Make sure to load with trust_remote_code=True.

Thanks to @dzhulgakov for his early implementation (https://github.com/dzhulgakov/llama-mistral) that helped me find a working setup.

Also many thanks to our friends at LAION and HessianAI for the compute used for these projects!

Benchmark scores:

hella swag: 0.8661
winogrande: 0.824
truthfulqa_mc2: 0.4855
arc_challenge:  0.6638
gsm8k: 0.5709
MMLU: 0.7173

Basic Inference setup

# transformers>=4.36 (build from source)
import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download

model = AutoModelForCausalLM.from_pretrained('AI-ModelScope/mixtral-7b-8expert', low_cpu_mem_usage=True, 
                                             device_map="auto", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained('AI-ModelScope/mixtral-7b-8expert')
x = tok.encode("The mistral wind in is a phenomenon ", return_tensors="pt").cuda()
x = model.generate(x, max_new_tokens=128).cpu()
print(tok.batch_decode(x))

Conversion

Use convert_mistral_moe_weights_to_hf.py --input_dir ./input_dir --model_size 7B --output_dir ./output to convert the original consolidated weights to this HF setup.

Come chat about this in our Disco(rd)! :)

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论