模型描述 (Model Description)
Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic
Shikra, an MLLM designed to kick off referential dialogue by excelling in spatial coordinate inputs/outputs in natural language, without additional vocabularies, position encoders, pre-/post-detection, or external plug-in models.
[Project Page] [Paper]
运行环境 (Operating environment)
Install
- git clone the repository
git clone --depth=1 --filter=blob:none --no-checkout https://www.modelscope.cn/haolan/shikra.git
cd shikra
git checkout master -- ms_wrapper.py
git checkout master -- mllm
git checkout master -- requirements.txt
- Install Package
conda create -n shikra python=3.10 -y
conda activate shikra
pip install -r requirements.txt
代码范例 (Code example)
from modelscope.models import Model
from modelscope.pipelines import pipeline
inference = pipeline('shikra-task', model='haolan/shikra')
data = {
'image_path':"mllm/demo/assets/man.jpg",
'user_input':"What is the person<boxes> scared of?",
'boxes_value':[[148, 99, 576, 497]],
'boxes_seq':[[0]]
}
output = inference(data)
print(output)
#注:模型加载可能需要几分钟的时间
Citation
@article{chen2023shikra,
title={Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic},
author={Chen, Keqin and Zhang, Zhao and Zeng, Weili and Zhang, Richong and Zhu, Feng and Zhao, Rui},
journal={arXiv preprint arXiv:2306.15195},
year={2023}
}
评论