官网地址
https://github.com/opengvlab开源地址
https://modelscope.cn/models/OpenGVLab/InternVL-14B-FlickrCN-FT-364px授权协议
mit

Model Card for IterVL-14B-FlickrCN-FT-364px

What is IterVL?

IterVL scales up the ViT to 6B parameters ad aligs it with LLM.

It is the largest ope-source visio/visio-laguage foudatio model (14B) to date, achievig 32 state-of-the-art performaces o a wide rage of tasks such as visual perceptio, cross-modal retrieval, multimodal dialogue, etc.

image/pg

Model Details

Model Type: fie-tued retrieval model
Support Tasks: image-text retrieval
Model Stats:
Params: 14B
Image size: 364 x 364
Fie-tue Dataset: FlickrCN

Settig

image/pg

Performace

See this documet for more details about the evaluatio.

image/pg

Model Usage

Note: the prefix 'summarize:' ad tokeizer.pad_toke_id = 0 are ecessary. Their absece will lead to abormal results.

import torch
from PIL import Image
from trasformers import AutoModel, CLIPImageProcessor
from trasformers import AutoTokeizer


model = AutoModel.from_pretraied(
    'OpeGVLab/IterVL-14B-FlickrCN-FT-364px',
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).cuda().eval()

image_processor = CLIPImageProcessor.from_pretraied('OpeGVLab/IterVL-14B-FlickrCN-FT-364px')

tokeizer = AutoTokeizer.from_pretraied(
    'OpeGVLab/IterVL-14B-FlickrCN-FT-364px', use_fast=False, add_eos_toke=True)
tokeizer.pad_toke_id = 0  # set pad_toke_id to 0

images = [
    Image.ope('./examples/image1.jpg').covert('RGB'),
    Image.ope('./examples/image2.jpg').covert('RGB'),
    Image.ope('./examples/image3.jpg').covert('RGB')
]
prefix = 'summarize:'
texts = [
    prefix + 'a photo of a red pada',  # Eglish
    prefix + '一张熊猫的照片',  # Chiese
    prefix + '二匹の猫の写真'  # Japaese
]

pixel_values = image_processor(images=images, retur_tesors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()
iput_ids = tokeizer(texts, retur_tesors='pt', max_legth=80,
                      trucatio=True, paddig='max_legth').iput_ids.cuda()

# IterVL-C
logits_per_image, logits_per_text = model(
    image=pixel_values, text=iput_ids, mode='IterVL-C')
probs = logits_per_image.softmax(dim=-1)

# IterVL-G
logits_per_image, logits_per_text = model(
    image=pixel_values, text=iput_ids, mode='IterVL-G')
probs = logits_per_image.softmax(dim=-1)

Citatio

If you fid this project useful i your research, please cosider citig:

@article{che2023itervl,
  title={IterVL: Scalig up Visio Foudatio Models ad Aligig for Geeric Visual-Liguistic Tasks},
  author={Che, Zhe ad Wu, Jiaa ad Wag, Wehai ad Su, Weijie ad Che, Guo ad Xig, Se ad Zhog, Muya ad Zhag, Qiglog ad Zhu, Xizhou ad Lu, Lewei ad Li, Bi ad Luo, Pig ad Lu, Tog ad Qiao, Yu ad Dai, Jifeg},
  joural={arXiv preprit arXiv:2312.14238},
  year={2023}
}

Ackowledgemet

IterVL is built with referece to the code of the followig projects: OpeAI CLIP, Ope CLIP, CLIP Bechmark, EVA, IterImage, ViT-Adapter, MMSegmetatio, Trasformers, DINOv2, BLIP-2, Qwe-VL, ad LLaVA-1.5. Thaks for their awesome work!

Model Card for InternVL-14B-FlickrCN-FT-364px What is InternVL? [Paper] [GitHub] [Chat Demo] Inte

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

InternVL-14B-FlickrCN-FT-364px

技术信息

作品详情

Model Card for IterVL-14B-FlickrCN-FT-364px

What is IterVL?

Model Details

Settig

Performace

Model Usage

Citatio

Ackowledgemet

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐