Dataset Card for RLAIF-V-Dataset

News:

[2024.05.28] ? Our paper is accesible at arxiv now!
[2024.05.20] ? Our data is used in MiniCPM-Llama3-V 2.5, which represents the first end-side MLLM achieving GPT-4V level performance!

Dataset Summary

RLAIF-V-Dataset is a large-scale multimodal feedback dataset. The dataset provides high-quality feedback with a total number of 83,132 preference pairs, where the instructions are collected from a diverse range of datasets including MSCOCO, ShareGPT-4V, MovieNet, Google Landmark v2, VQA v2, OKVQA, and TextVQA. In addition, we adopt image description prompts introduced in RLHF-V to as long-form image-captioning instructions.

By training on these data, our models can reach superior trustworthiness compared to both open-source and proprietary models.

fig1

More experimental results are in the following table. By applying RLAIF-V, we present the RLAIF-V 7B (the most trustworthy variant of LLaVA 1.5) and RLAIF-V 12B (the most trustworthy MLLM), with outstanding trustworthiness and competitive general performance:

fig1

Our data also exhibits good generalizability to improve the trustworthiness of a diverse set of MLLMs.

fig2

Related Sources

Models Trained on RLAIF-V:
? MiniCPM-V Series: MiniCPM-V is a series of end-side MLLMs with GPT-4V comparable performance.
? RLAIF-V: RLAIF-V is a series of MLLMs with far more trustworthiness than GPT-4V.

Usage

from datasets import load_dataset

data = load_dataset("openbmb/RLAIF-V-Dataset")

Data fields

	Key	Description
0	`ds_name`	Dataset name.
1	`image`	Dict contains path and bytes. If loaded by `load_dataset`, it can be automatically converted into a PIL Image.
2	`question`	Input query for MLLMs.
3	`chosen`	Chosen response for the question.
4	`rejected`	Rejected response for the question.
5	`origin_dataset`	Original dataset for the image or question.
6	`origin_split`	Meta information for each data item, including the name of the model we use to generate the chosen and rejected answer pair, the labeling model to provide feedback, and the question type ("detailed description" or "question answering")
7	`idx`	Data index.
8	`image_path`	Image path.

Citation

If you find our model/code/paper helpful, please consider cite our papers ?:

@article{yu2023rlhf,
  title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback},
  author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others},
  journal={arXiv preprint arXiv:2312.00849},
  year={2023}
}

@article{yu2024rlaifv,
  title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness}, 
  author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
  journal={arXiv preprint arXiv:2405.17220},
  year={2024},
}

RLAIF-V-Dat

作品详情