Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Quick Start
LLaVA-v1.5
Install LLaVA.
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
Simple Interactive Demos.
See the codes and scripts below.
Example Code (Single Query)
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model
model_path = "teowu/llava_v1.5_7b_qinstruct_preview_v0.1"
prompt = "Rate the quality of the image. Think step by step."
image_file = "fig/sausage.jpg"
args = type('Args', (), {
"model_path": model_path,
"model_base": None,
"model_name": get_model_name_from_path(model_path),
"query": prompt,
"conv_mode": None,
"image_file": image_file,
"sep": ",",
})()
eval_model(args)
Example Code (CLI Demo for Multi-turn Conversation)
python -m llava.serve.cli \
--model-path teowu/llava_v1.5_7b_qinstruct_preview_v0.1 \
--image-file "fig/sausage.jpg" \
Note: The results may contain randomness as do_sample=True
is enabled during conversation mode.
Quantitative Evaluations
Multi-choice question (MCQ) in Q-Bench.
python eval_scripts/llava_v1.5/eval_qbench_mcq.py
Image/Video Quality Assessment
Image Quality Assessment:
python eval_scripts/llava_v1.5/eval_image_quality.py
Video Quality Assessment:
python eval_scripts/llava_v1.5/eval_video_quality.py
mPLUG-Owl-2
Coming soon.
InternLM-XComposer-VL
Coming soon.
Model Zoo
All weights are converted into Huggingface format and totally compatible with the base repositories (LLaVA, mPLUG-Owl, InternLM-XComposer). After installing the base repositories, you can change the HF-path in the original evaluation scripts into the following ones, so as to automatically download the Q-Instruct-tuned versions.
Released:
- LLaVA-v1.5-7B (mix), HF-path:
teowu/llava_v1.5_7b_qinstruct_preview_v0.1
- LLaVA-v1.5-13B (mix), HF-path:
teowu/llava_v1.5_13b_qinstruct_preview_v0.1
Coming Soon:
- mPLUG-Owl-2 (mix)
- InternLM-XComposer-VL (mix)
Training
At present, we only provide the training scripts with LLaVA-v1.5. Please see Training Docs for more details.
License
Researchers and open-source developers are free to use the Q-Instruct dataset and the fine-tuned weights as provided for the four MLLMs. We also allow commercial use, while any commercial use should be pre-permitted by our team. Please email haoning001@e.ntu.edu.sg
to gain the permission for commercial use.
评论