DeepSeek-V2-Chat

我要开发同款
匿名用户2024年07月31日
1330阅读

技术信息

官网地址
https://www.deepseek.com/
开源地址
https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat

作品详情


Model Dowload | Evaluatio Results | Model Architecture | API Platform | Licese | Citatio

Paper Lik?️

DeepSeek-V2: A Strog, Ecoomical, ad Efficiet Mixture-of-Experts Laguage Model

1. Itroductio

Today, we’re itroducig DeepSeek-V2, a strog Mixture-of-Experts (MoE) laguage model characterized by ecoomical traiig ad efficiet iferece. It comprises 236B total parameters, of which 21B are activated for each toke. Compared with DeepSeek 67B, DeepSeek-V2 achieves stroger performace, ad meawhile saves 42.5% of traiig costs, reduces the KV cache by 93.3%, ad boosts the maximum geeratio throughput to 5.76 times.

We pretraied DeepSeek-V2 o a diverse ad high-quality corpus comprisig 8.1 trillio tokes. This comprehesive pretraiig was followed by a process of Supervised Fie-Tuig (SFT) ad Reiforcemet Learig (RL) to fully uleash the model's capabilities. The evaluatio results validate the effectiveess of our approach as DeepSeek-V2 achieves remarkable performace o both stadard bechmarks ad ope-eded geeratio evaluatio.

2. Model Dowloads

| **Model** | **Cotext Legth** | **Dowload** | | :------------: | :------------: | :------------: | | DeepSeek-V2 | 128k | [? HuggigFace](https://huggigface.co/deepseek-ai/DeepSeek-V2) | | DeepSeek-V2-Chat(RL) | 128k | [? HuggigFace](https://huggigface.co/deepseek-ai/DeepSeek-V2-Chat) |

Due to the costraits of HuggigFace, the ope-source code curretly experieces slower performace tha our iteral codebase whe ruig o GPUs with Huggigface. To facilitate the efficiet executio of our model, we offer a dedicated vllm solutio that optimizes performace for ruig our model effectively.

3. Evaluatio Results

Base Model

Stadard Bechmark

| **Bechmark** | **Domai** | **LLaMA3 70B** | **Mixtral 8x22B** | **DeepSeek V1 (Dese-67B)** | **DeepSeek V2 (MoE-236B)** | |:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:| | **MMLU** | Eglish | 78.9 | 77.6 | 71.3 | 78.5 | | **BBH** | Eglish | 81.0 | 78.9 | 68.7 | 78.9 | | **C-Eval** | Chiese | 67.5 | 58.6 | 66.1 | 81.7 | | **CMMLU** | Chiese | 69.3 | 60.0 | 70.8 | 84.0 | | **HumaEval** | Code | 52.4 | 39.0 | 42.7 | 40.9 | | **MBPP** | Code | 68.6 | 64.2 | 57.4 | 66.6 | | **GSM8K** | Math | 83.0 | 80.3 | 63.4 | 79.2 | | **Math** | Math | 42.2 | 42.5 | 18.7 | 43.6 |

For more evaluatio details, such as few-shot settigs ad prompts, please check our paper.

Cotext Widow

Evaluatio results o the Needle I A Haystack (NIAH) tests. DeepSeek-V2 performs well across all cotext widow legths up to 128K.

Chat Model

Stadard Bechmark

| Bechmark | Domai | QWe1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Istruct | DeepSeek V1 Chat (SFT) | DeepSeek V2 Chat(SFT) | DeepSeek V2 Chat(RL) | |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:| | **MMLU** | Eglish | 76.2 | 77.8 | 80.3 | 71.1 | 78.4 | 77.8 | | **BBH** | Eglish | 65.9 | 78.4 | 80.1 | 71.7 | 81.3 | 79.7 | | **C-Eval** | Chiese | 82.2 | 60.0 | 67.9 | 65.2 | 80.9 | 78.0 | | **CMMLU** | Chiese | 82.9 | 61.0 | 70.7 | 67.8 | 82.4 | 81.6 | | **HumaEval** | Code | 68.9 | 75.0 | 76.2 | 73.8 | 76.8 | 81.1 | | **MBPP** | Code | 52.2 | 64.4 | 69.8 | 61.4 | 70.4 | 72.0 | | **LiveCodeBech (0901-0401)** | Code | 18.8 | 25.0 | 30.5 | 18.3 | 28.7 | 32.5 | | **GSM8K** | Math | 81.9 | 87.9 | 93.2 | 84.1 | 90.8 | 92.2 | | **Math** | Math | 40.6 | 49.8 | 48.5 | 32.6 | 52.7 | 53.9 |

Eglish Ope Eded Geeratio Evaluatio

We evaluate our model o AlpacaEval 2.0 ad MTBech, showig the competitive performace of DeepSeek-V2-Chat-RL o Eglish coversatio geeratio.

Chiese Ope Eded Geeratio Evaluatio

Aligbech (https://arxiv.org/abs/2311.18743)

| **模型** | **开源/闭源** | **总分** | **中文推理** | **中文语言** | | :---: | :---: | :---: | :---: | :---: | | gpt-4-1106-preview | 闭源 | 8.01 | 7.73 | 8.29 | | DeepSeek-V2 Chat(RL) | 开源 | 7.91 | 7.45 | 8.35 | | eriebot-4.0-202404(文心一言) | 闭源 | 7.89 | 7.61 | 8.17 | | DeepSeek-V2 Chat(SFT) | 开源 | 7.74 | 7.30 | 8.17 | | gpt-4-0613 | 闭源 | 7.53 | 7.47 | 7.59 | | eriebot-4.0-202312(文心一言) | 闭源 | 7.36 | 6.84 | 7.88 | | mooshot-v1-32k-202404(月之暗面) | 闭源 | 7.22 | 6.42 | 8.02 | | Qwe1.5-72B-Chat(通义千问) | 开源 | 7.19 | 6.45 | 7.93 | | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 | | Yi-34B-Chat(零一万物) | 开源 | 6.12 | 4.86 | 7.38 | | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 |

Codig Bechmarks

We evaluate our model o LiveCodeBech (0901-0401), a bechmark desiged for live codig challeges. As illustrated, DeepSeek-V2 demostrates cosiderable proficiecy i LiveCodeBech, achievig a Pass@1 score that surpasses several other sophisticated models. This performace highlights the model's effectiveess i tacklig live codig tasks.

4. Model Architecture

DeepSeek-V2 adopts iovative architectures to guaratee ecoomical traiig ad efficiet iferece:

  • For attetio, we desig MLA (Multi-head Latet Attetio), which utilizes low-rak key-value uio compressio to elimiate the bottleeck of iferece-time key-value cache, thus supportig efficiet iferece.
  • For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performace MoE architecture that eables traiig stroger models at lower costs.

5. Chat Website

You ca chat with the DeepSeek-V2 o DeepSeek's official website: chat.deepseek.com

6. API Platform

We also provide OpeAI-Compatible API at DeepSeek Platform: platform.deepseek.com. Sig up for over millios of free tokes. Ad you ca also pay-as-you-go at a ubeatable price.

7. How to ru locally

To utilize DeepSeek-V2 i BF16 format for iferece, 80GB*8 GPUs are required.

Iferece with Huggigface's Trasformers

You ca directly employ Huggigface's Trasformers for model iferece.

Text Completio

import torch
from trasformers import AutoTokeizer, AutoModelForCausalLM, GeeratioCofig

model_ame = "deepseek-ai/DeepSeek-V2"
tokeizer = AutoTokeizer.from_pretraied(model_ame, trust_remote_code=True)
# `max_memory` should be set based o your devices
max_memory = {i: "75GB" for i i rage(8)}
model = AutoModelForCausalLM.from_pretraied(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.geeratio_cofig = GeeratioCofig.from_pretraied(model_ame)
model.geeratio_cofig.pad_toke_id = model.geeratio_cofig.eos_toke_id

text = "A attetio fuctio ca be described as mappig a query ad a set of key-value pairs to a output, where the query, keys, values, ad output are all vectors. The output is"
iputs = tokeizer(text, retur_tesors="pt")
outputs = model.geerate(**iputs.to(model.device), max_ew_tokes=100)

result = tokeizer.decode(outputs[0], skip_special_tokes=True)
prit(result)

Chat Completio

import torch
from modelscope import AutoTokeizer, AutoModelForCausalLM, GeeratioCofig

model_ame = "deepseek-ai/DeepSeek-V2-Chat"
tokeizer = AutoTokeizer.from_pretraied(model_ame, trust_remote_code=True)
# `max_memory` should be set based o your devices
max_memory = {i: "75GB" for i i rage(8)}
model = AutoModelForCausalLM.from_pretraied(model_ame, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.geeratio_cofig = GeeratioCofig.from_pretraied(model_ame)
model.geeratio_cofig.pad_toke_id = model.geeratio_cofig.eos_toke_id

messages = [
    {"role": "user", "cotet": "Write a piece of quicksort code i C++"}
]
iput_tesor = tokeizer.apply_chat_template(messages, add_geeratio_prompt=True, retur_tesors="pt")
outputs = model.geerate(iput_tesor.to(model.device), max_ew_tokes=100)

result = tokeizer.decode(outputs[0][iput_tesor.shape[1]:], skip_special_tokes=True)
prit(result)

The complete chat template ca be foud withi tokeizer_cofig.jso located i the huggigface model repository.

A example of chat template is as belows:

<|begi▁of▁setece|>User: {user_message_1}

Assistat: {assistat_message_1}<|ed▁of▁setece|>User: {user_message_2}

Assistat:

You ca also add a optioal system message:

<|begi▁of▁setece|>{system_message}

User: {user_message_1}

Assistat: {assistat_message_1}<|ed▁of▁setece|>User: {user_message_2}

Assistat:

8. Licese

This code repository is licesed uder the MIT Licese. The use of DeepSeek-V2 Base/Chat models is subject to the Model Licese. DeepSeek-V2 series (icludig Base ad Chat) supports commercial use.

9. Citatio

@misc{deepseek-v2,
  author = {DeepSeek-AI},
  title  = {DeepSeek-V2: A Strog, Ecoomical, ad Efficiet Mixture-of-Experts Laguage Model},
  year   = {2024},
  ote   = {GitHub repository},
  url    = {https://github.com/deepseek-ai/deepseek-v2}
  }

10. Cotact

If you have ay questios, please raise a issue or cotact us at service@deepseek.com.

功能介绍

Model Download | Evaluation Results | Model Architecture | API Platform | License | Citati

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论