Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Haoning Wu¹^*, Zicheng Zhang²^*, Erli Zhang¹^*, Chaofeng Chen¹, Liang Liao¹, Annan Wang¹, Kaixin Xu⁴,

Chunyi Li², Jingwen Hou¹, Guangtao Zhai², Geng Xue⁴, Wenxiu Sun³, Qiong Yan³, Weisi Lin¹^#

¹Nanyang Technological University, ²Shanghai Jiaotong University, ³Sensetime Research, ⁴I2R@A*STAR

^*Equal contribution. ^#Corresponding author.

Dataset | Weights (LLaVA-v1.5-7B) | Weights (LLaVA-v1.5-13B) | Paper

Quick Start

LLaVA-v1.5

Install LLaVA.

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .

Simple Interactive Demos.

See the codes and scripts below.

Example Code (Single Query)

from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model
model_path = "teowu/llava_v1.5_7b_qinstruct_preview_v0.1" 
prompt = "Rate the quality of the image. Think step by step."
image_file = "fig/sausage.jpg"
args = type('Args', (), {
    "model_path": model_path,
    "model_base": None,
    "model_name": get_model_name_from_path(model_path),
    "query": prompt,
    "conv_mode": None,
    "image_file": image_file,
    "sep": ",",
})()
eval_model(args)

Example Code (CLI Demo for Multi-turn Conversation)

python -m llava.serve.cli \
    --model-path teowu/llava_v1.5_7b_qinstruct_preview_v0.1 \
    --image-file "fig/sausage.jpg" \

Note: The results may contain randomness as do_sample=True is enabled during conversation mode.

Quantitative Evaluations

Multi-choice question (MCQ) in Q-Bench.

python eval_scripts/llava_v1.5/eval_qbench_mcq.py

Image/Video Quality Assessment

Image Quality Assessment:

python eval_scripts/llava_v1.5/eval_image_quality.py

Video Quality Assessment:

python eval_scripts/llava_v1.5/eval_video_quality.py

mPLUG-Owl-2

Coming soon.

InternLM-XComposer-VL

Coming soon.

Model Zoo

All weights are converted into Huggingface format and totally compatible with the base repositories (LLaVA, mPLUG-Owl, InternLM-XComposer). After installing the base repositories, you can change the HF-path in the original evaluation scripts into the following ones, so as to automatically download the Q-Instruct-tuned versions.

Released:

LLaVA-v1.5-7B (mix), HF-path: teowu/llava_v1.5_7b_qinstruct_preview_v0.1
LLaVA-v1.5-13B (mix), HF-path: teowu/llava_v1.5_13b_qinstruct_preview_v0.1

Coming Soon:

mPLUG-Owl-2 (mix)
InternLM-XComposer-VL (mix)

Training

At present, we only provide the training scripts with LLaVA-v1.5. Please see Training Docs for more details.

License

Researchers and open-source developers are free to use the Q-Instruct dataset and the fine-tuned weights as provided for the four MLLMs. We also allow commercial use, while any commercial use should be pre-permitted by our team. Please email haoning001@e.ntu.edu.sg to gain the permission for commercial use.

llava_v1.5_13b_qinstruct_preview_v0.1

作品详情