官网地址
https://www.deepseek.com/开源地址
https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat

DeepSeek-V2: A Strog, Ecoomical, ad Efficiet Mixture-of-Experts Laguage Model

1. Itroductio

Today, we’re itroducig DeepSeek-V2, a strog Mixture-of-Experts (MoE) laguage model characterized by ecoomical traiig ad efficiet iferece. It comprises 236B total parameters, of which 21B are activated for each toke. Compared with DeepSeek 67B, DeepSeek-V2 achieves stroger performace, ad meawhile saves 42.5% of traiig costs, reduces the KV cache by 93.3%, ad boosts the maximum geeratio throughput to 5.76 times.

We pretraied DeepSeek-V2 o a diverse ad high-quality corpus comprisig 8.1 trillio tokes. This comprehesive pretraiig was followed by a process of Supervised Fie-Tuig (SFT) ad Reiforcemet Learig (RL) to fully uleash the model's capabilities. The evaluatio results validate the effectiveess of our approach as DeepSeek-V2 achieves remarkable performace o both stadard bechmarks ad ope-eded geeratio evaluatio.

2. Model Dowloads

| **Model** | **Cotext Legth** | **Dowload** | | :------------: | :------------: | :------------: | | DeepSeek-V2 | 128k | [? HuggigFace](https://huggigface.co/deepseek-ai/DeepSeek-V2) | | DeepSeek-V2-Chat(RL) | 128k | [? HuggigFace](https://huggigface.co/deepseek-ai/DeepSeek-V2-Chat) |

Due to the costraits of HuggigFace, the ope-source code curretly experieces slower performace tha our iteral codebase whe ruig o GPUs with Huggigface. To facilitate the efficiet executio of our model, we offer a dedicated vllm solutio that optimizes performace for ruig our model effectively.

3. Evaluatio Results

Base Model

Stadard Bechmark

| **Bechmark** | **Domai** | **LLaMA3 70B** | **Mixtral 8x22B** | **DeepSeek V1 (Dese-67B)** | **DeepSeek V2 (MoE-236B)** | |:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:| | **MMLU** | Eglish | 78.9 | 77.6 | 71.3 | 78.5 | | **BBH** | Eglish | 81.0 | 78.9 | 68.7 | 78.9 | | **C-Eval** | Chiese | 67.5 | 58.6 | 66.1 | 81.7 | | **CMMLU** | Chiese | 69.3 | 60.0 | 70.8 | 84.0 | | **HumaEval** | Code | 52.4 | 39.0 | 42.7 | 40.9 | | **MBPP** | Code | 68.6 | 64.2 | 57.4 | 66.6 | | **GSM8K** | Math | 83.0 | 80.3 | 63.4 | 79.2 | | **Math** | Math | 42.2 | 42.5 | 18.7 | 43.6 |

For more evaluatio details, such as few-shot settigs ad prompts, please check our paper.

Cotext Widow

Evaluatio results o the Needle I A Haystack (NIAH) tests. DeepSeek-V2 performs well across all cotext widow legths up to 128K.

Chat Model

Stadard Bechmark

| Bechmark | Domai | QWe1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Istruct | DeepSeek V1 Chat (SFT) | DeepSeek V2 Chat(SFT) | DeepSeek V2 Chat(RL) | |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:| | **MMLU** | Eglish | 76.2 | 77.8 | 80.3 | 71.1 | 78.4 | 77.8 | | **BBH** | Eglish | 65.9 | 78.4 | 80.1 | 71.7 | 81.3 | 79.7 | | **C-Eval** | Chiese | 82.2 | 60.0 | 67.9 | 65.2 | 80.9 | 78.0 | | **CMMLU** | Chiese | 82.9 | 61.0 | 70.7 | 67.8 | 82.4 | 81.6 | | **HumaEval** | Code | 68.9 | 75.0 | 76.2 | 73.8 | 76.8 | 81.1 | | **MBPP** | Code | 52.2 | 64.4 | 69.8 | 61.4 | 70.4 | 72.0 | | **LiveCodeBech (0901-0401)** | Code | 18.8 | 25.0 | 30.5 | 18.3 | 28.7 | 32.5 | | **GSM8K** | Math | 81.9 | 87.9 | 93.2 | 84.1 | 90.8 | 92.2 | | **Math** | Math | 40.6 | 49.8 | 48.5 | 32.6 | 52.7 | 53.9 |

Eglish Ope Eded Geeratio Evaluatio

We evaluate our model o AlpacaEval 2.0 ad MTBech, showig the competitive performace of DeepSeek-V2-Chat-RL o Eglish coversatio geeratio.

Chiese Ope Eded Geeratio Evaluatio

Aligbech (https://arxiv.org/abs/2311.18743)

| **模型** | **开源/闭源** | **总分** | **中文推理** | **中文语言** | | :---: | :---: | :---: | :---: | :---: | | gpt-4-1106-preview | 闭源 | 8.01 | 7.73 | 8.29 | | DeepSeek-V2 Chat(RL) | 开源 | 7.91 | 7.45 | 8.35 | | eriebot-4.0-202404(文心一言) | 闭源 | 7.89 | 7.61 | 8.17 | | DeepSeek-V2 Chat(SFT) | 开源 | 7.74 | 7.30 | 8.17 | | gpt-4-0613 | 闭源 | 7.53 | 7.47 | 7.59 | | eriebot-4.0-202312(文心一言) | 闭源 | 7.36 | 6.84 | 7.88 | | mooshot-v1-32k-202404(月之暗面) | 闭源 | 7.22 | 6.42 | 8.02 | | Qwe1.5-72B-Chat(通义千问) | 开源 | 7.19 | 6.45 | 7.93 | | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 | | Yi-34B-Chat(零一万物) | 开源 | 6.12 | 4.86 | 7.38 | | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 |

Codig Bechmarks

We evaluate our model o LiveCodeBech (0901-0401), a bechmark desiged for live codig challeges. As illustrated, DeepSeek-V2 demostrates cosiderable proficiecy i LiveCodeBech, achievig a Pass@1 score that surpasses several other sophisticated models. This performace highlights the model's effectiveess i tacklig live codig tasks.

4. Model Architecture

DeepSeek-V2 adopts iovative architectures to guaratee ecoomical traiig ad efficiet iferece：

For attetio, we desig MLA (Multi-head Latet Attetio), which utilizes low-rak key-value uio compressio to elimiate the bottleeck of iferece-time key-value cache, thus supportig efficiet iferece.
For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performace MoE architecture that eables traiig stroger models at lower costs.

5. Chat Website

You ca chat with the DeepSeek-V2 o DeepSeek's official website: chat.deepseek.com

6. API Platform

We also provide OpeAI-Compatible API at DeepSeek Platform: platform.deepseek.com. Sig up for over millios of free tokes. Ad you ca also pay-as-you-go at a ubeatable price.

7. How to ru locally

To utilize DeepSeek-V2 i BF16 format for iferece, 80GB*8 GPUs are required.

Iferece with Huggigface's Trasformers

You ca directly employ Huggigface's Trasformers for model iferece.

Text Completio

import torch
from trasformers import AutoTokeizer, AutoModelForCausalLM, GeeratioCofig

model_ame = "deepseek-ai/DeepSeek-V2"
tokeizer = AutoTokeizer.from_pretraied(model_ame, trust_remote_code=True)
# `max_memory` should be set based o your devices
max_memory = {i: "75GB" for i i rage(8)}
model = AutoModelForCausalLM.from_pretraied(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.geeratio_cofig = GeeratioCofig.from_pretraied(model_ame)
model.geeratio_cofig.pad_toke_id = model.geeratio_cofig.eos_toke_id

text = "A attetio fuctio ca be described as mappig a query ad a set of key-value pairs to a output, where the query, keys, values, ad output are all vectors. The output is"
iputs = tokeizer(text, retur_tesors="pt")
outputs = model.geerate(**iputs.to(model.device), max_ew_tokes=100)

result = tokeizer.decode(outputs[0], skip_special_tokes=True)
prit(result)

Chat Completio

import torch
from modelscope import AutoTokeizer, AutoModelForCausalLM, GeeratioCofig

model_ame = "deepseek-ai/DeepSeek-V2-Chat"
tokeizer = AutoTokeizer.from_pretraied(model_ame, trust_remote_code=True)
# `max_memory` should be set based o your devices
max_memory = {i: "75GB" for i i rage(8)}
model = AutoModelForCausalLM.from_pretraied(model_ame, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.geeratio_cofig = GeeratioCofig.from_pretraied(model_ame)
model.geeratio_cofig.pad_toke_id = model.geeratio_cofig.eos_toke_id

messages = [
    {"role": "user", "cotet": "Write a piece of quicksort code i C++"}
]
iput_tesor = tokeizer.apply_chat_template(messages, add_geeratio_prompt=True, retur_tesors="pt")
outputs = model.geerate(iput_tesor.to(model.device), max_ew_tokes=100)

result = tokeizer.decode(outputs[0][iput_tesor.shape[1]:], skip_special_tokes=True)
prit(result)

The complete chat template ca be foud withi tokeizer_cofig.jso located i the huggigface model repository.

A example of chat template is as belows:

<｜begi▁of▁setece｜>User: {user_message_1}

Assistat: {assistat_message_1}<｜ed▁of▁setece｜>User: {user_message_2}

Assistat:

You ca also add a optioal system message:

<｜begi▁of▁setece｜>{system_message}

User: {user_message_1}

Assistat: {assistat_message_1}<｜ed▁of▁setece｜>User: {user_message_2}

Assistat:

8. Licese

This code repository is licesed uder the MIT Licese. The use of DeepSeek-V2 Base/Chat models is subject to the Model Licese. DeepSeek-V2 series (icludig Base ad Chat) supports commercial use.

9. Citatio

@misc{deepseek-v2,
  author = {DeepSeek-AI},
  title  = {DeepSeek-V2: A Strog, Ecoomical, ad Efficiet Mixture-of-Experts Laguage Model},
  year   = {2024},
  ote   = {GitHub repository},
  url    = {https://github.com/deepseek-ai/deepseek-v2}
  }

10. Cotact

If you have ay questios, please raise a issue or cotact us at service@deepseek.com.

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

DeepSeek-V2-Chat

技术信息

作品详情

DeepSeek-V2: A Strog, Ecoomical, ad Efficiet Mixture-of-Experts Laguage Model

1. Itroductio

2. Model Dowloads

3. Evaluatio Results

Base Model

Stadard Bechmark

Cotext Widow

Chat Model

Stadard Bechmark

Eglish Ope Eded Geeratio Evaluatio

Chiese Ope Eded Geeratio Evaluatio

Codig Bechmarks

4. Model Architecture

5. Chat Website

6. API Platform

7. How to ru locally

Iferece with Huggigface's Trasformers

Text Completio

Chat Completio

8. Licese

9. Citatio

10. Cotact

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐