视觉基础模型-分类任务
DINOv2是一种自监督视觉模型训练方法,更多详细信息可点击DINOv2。
模型描述
本模型页提供了DINOv2方案训练出来的ViTG主干网络和与之匹配的线性分类器,这两个模块进行组合便可实现图像分类任务。 同时,本模型也支持图片特征提取模式,只输出主干网络提取的特征,用于后续任务。
期望模型使用方式以及适用范围
该模型适用自然场景图片,在ImageNet-1k数据集上表现SOTA。
如何使用模型
- 输入自然场景图像,按下列代码范例进行模型推理。
代码范例
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
## using cuda
# dinov2_pipe = pipeline(
# Tasks.image_classification,
# model="jp_lan/cv_vitg_classification_dinov2",
# model_revision="v1.0.1",
# device="cuda",
# )
## using cpu
dinov2_pipe = pipeline(
Tasks.image_classification,
model="jp_lan/cv_vitg_classification_dinov2",
model_revision="v1.0.1",
device="cpu",
)
input_image_file = 'https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/bird.JPEG'
## using DINOv2 model to prediction image labels
result = dinov2_pipe(input_image_file)
print("result is : ", result)
## using DINOv2 model to extract features
# output = dinov2_pipe(input_image_file, output_features_only=True)
# print("feature length = ", len(output["feature"]))
模型局限性以及可能的偏差
- 建议在有GPU的机器上进行测试,由于硬件精度影响,CPU上的结果会和GPU上的结果略有差异。
数据评估及结果
模型在GOT-10k和TrackingNet的测试集上客观指标如下:
Method | Backbone | Classifier | ImageNet-1k top-1 accuracy |
---|---|---|---|
DINOv2 | ViT-g/14 | linear | 86.5% |
相关论文以及引用信息
本模型主要参考论文如下:
@misc{oquab2023dinov2,
title={DINOv2: Learning Robust Visual Features without Supervision},
author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
journal={arXiv:2304.07193},
year={2023}
}
Clone with HTTP
git clone https://www.modelscope.cn/jp_lan/cv_vitg_classification_dinov2.git
评论