deepseek-vl-1.3b-chat

我要开发同款
匿名用户2024年07月31日
100阅读

技术信息

官网地址
https://www.deepseek.com/
开源地址
https://modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat
授权协议
other

作品详情

1. Itroductio

Itroducig DeepSeek-VL, a ope-source Visio-Laguage (VL) Model desiged for real-world visio ad laguage uderstadig applicatios. DeepSeek-VL possesses geeral multimodal uderstadig capabilities, capable of processig logical diagrams, web pages, formula recogitio, scietific literature, atural images, ad embodied itelligece i complex scearios.

DeepSeek-VL: Towards Real-World Visio-Laguage Uderstadig

Github Repository

Haoyu Lu, We Liu, Bo Zhag*, Bigxua Wag, Kai Dog, Bo Liu, Jigxiag Su, Togzheg Re, Zhuoshu Li, Yaofeg Su, Chegqi Deg, Hawei Xu, Zheda Xie, Chog Rua (Equal Cotributio, **Project Leader)

2. Model Summary

DeepSeek-VL-1.3b-chat is a tiy visio-laguage model. It uses the SigLIP-L as the visio ecoder supportig 384 x 384 image iput ad is costructed based o the DeepSeek-LLM-1.3b-base which is traied o a approximate corpus of 500B text tokes. The whole DeepSeek-VL-1.3b-base model is fially traied aroud 400B visio-laguage tokes. The DeepSeek-VL-1.3b-chat is a istructed versio based o DeepSeek-VL-1.3b-base.

3. Quick Start

Istallatio

O the basis of Pytho >= 3.8 eviromet, istall the ecessary depedecies by ruig the followig commad:

git cloe https://github.com/deepseek-ai/DeepSeek-VL
cd DeepSeek-VL

pip istall -e .

Simple Iferece Example

import torch
from trasformers import AutoModelForCausalLM

from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
from deepseek_vl.utils.io import load_pil_images

from modelscope import sapshot_dowload
# specify the path to the model
model_path = sapshot_dowload("deepseek-ai/deepseek-vl-1.3b-chat")
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretraied(model_path)
tokeizer = vl_chat_processor.tokeizer

vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretraied(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

coversatio = [
    {
        "role": "User",
        "cotet": "<image_placeholder>Describe each stage of this image.",
        "images": ["./images/traiig_pipelies.pg"]
    },
    {
        "role": "Assistat",
        "cotet": ""
    }
]

# load images ad prepare for iputs
pil_images = load_pil_images(coversatio)
prepare_iputs = vl_chat_processor(
    coversatios=coversatio,
    images=pil_images,
    force_batchify=True
).to(vl_gpt.device)

# ru image ecoder to get the image embeddigs
iputs_embeds = vl_gpt.prepare_iputs_embeds(**prepare_iputs)

# ru the model to get the respose
outputs = vl_gpt.laguage_model.geerate(
    iputs_embeds=iputs_embeds,
    attetio_mask=prepare_iputs.attetio_mask,
    pad_toke_id=tokeizer.eos_toke_id,
    bos_toke_id=tokeizer.bos_toke_id,
    eos_toke_id=tokeizer.eos_toke_id,
    max_ew_tokes=512,
    do_sample=False,
    use_cache=True
)

aswer = tokeizer.decode(outputs[0].cpu().tolist(), skip_special_tokes=True)
prit(f"{prepare_iputs['sft_format'][0]}", aswer)

CLI Chat

pytho cli_chat.py --model_path "deepseek-ai/deepseek-vl-1.3b-chat"

# or local path
pytho cli_chat.py --model_path "local model path"

4. Licese

This code repository is licesed uder the MIT Licese. The use of DeepSeek-VL Base/Chat models is subject to DeepSeek Model Licese. DeepSeek-VL series (icludig Base ad Chat) supports commercial use.

5. Citatio

@misc{lu2024deepseekvl,
      title={DeepSeek-VL: Towards Real-World Visio-Laguage Uderstadig}, 
      author={Haoyu Lu ad We Liu ad Bo Zhag ad Bigxua Wag ad Kai Dog ad Bo Liu ad Jigxiag Su ad Togzheg Re ad Zhuoshu Li ad Yaofeg Su ad Chegqi Deg ad Hawei Xu ad Zheda Xie ad Chog Rua},
      year={2024},
      eprit={2403.05525},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

6. Cotact

If you have ay questios, please raise a issue or cotact us at service@deepseek.com.

功能介绍

1. Introduction Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论