shikra具有坐标参照的视觉问答模型

我要开发同款
匿名用户2024年07月31日
27阅读
所属分类ai、shikra、pytorch、visual-question-answ、multimodal
开源地址https://modelscope.cn/models/haolan/shikra
授权协议Apache License 2.0

作品详情

模型描述 (Model Description)

Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic

Shikra, an MLLM designed to kick off referential dialogue by excelling in spatial coordinate inputs/outputs in natural language, without additional vocabularies, position encoders, pre-/post-detection, or external plug-in models.

[Project Page] [Paper]

运行环境 (Operating environment)

Install

  1. git clone the repository
git clone --depth=1 --filter=blob:none --no-checkout https://www.modelscope.cn/haolan/shikra.git
cd shikra
git checkout master -- ms_wrapper.py
git checkout master -- mllm
git checkout master -- requirements.txt
  1. Install Package
conda create -n shikra python=3.10 -y
conda activate shikra
pip install -r requirements.txt

代码范例 (Code example)

from modelscope.models import Model
from modelscope.pipelines import pipeline

inference = pipeline('shikra-task', model='haolan/shikra')


data = {
    'image_path':"mllm/demo/assets/man.jpg",
    'user_input':"What is the person<boxes> scared of?",
    'boxes_value':[[148, 99, 576, 497]],
    'boxes_seq':[[0]]
}
output = inference(data)

print(output)

#注:模型加载可能需要几分钟的时间

Citation

@article{chen2023shikra,
  title={Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic},
  author={Chen, Keqin and Zhang, Zhao and Zeng, Weili and Zhang, Richong and Zhu, Feng and Zhao, Rui},
  journal={arXiv preprint arXiv:2306.15195},
  year={2023}
}
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论