Normal-Depth-Diffusion-Model（ND）介绍

文本生成法向和深度图的扩散模型：给定一个文本描述，通过设置不同的模型参数，可以返回：

1.单张场景的法向图和深度图

2.多张不同视角的物体法向图和深度图

同时提供了给定文本描述和深度图, 返回既符合文本描述而且由深度图控制的反照率贴图（Albedo）

模型描述

ND是通用的文本生成法向图和深度图的扩散模型。采用扩散模型的框架，先在大规模的LAION 2B¹ 20亿数据上进行预训练, 得到文本生成单张场景的法向图和深度图的模型，其中训练的法向图和深度图由Midas-3.1² 和Normal-Bae³预测；然后在3D物体数据集Objaverse⁴ ，使用真实的多视角法向图和深度图训练，得到文本生成多张不同视角的物体法向图和深度图的模型；同时利用Objaverse，使用真实的多视角深度图和反照率图训练，得到文本和深度图共同控制生成反照率图的模型。详见：RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

nd-arch

模型效果如下：

文本生成单张场景的法向图和深度图的模型效果

text-to-nd-laion

文本生成多张不同视角的物体的深度图和模型效果

text-to-nd-mv

文本和深度图共同控制生成反照率图的模型效果

期望模型使用方式以及适用范围

如何使用

按照normal-depth-diffusion仓库的部署说明，安装环境依赖，然后参考以下的代码范例使用

代码范例

# download nd models
python tools/download_models/download_nd_models.py
# do inference
python demo_inference.sh

训练数据介绍

LAION 2B: 包含23亿个经过CLIP⁵过滤的图像和英文文本对。详见：Laion-5b: An open large-scale dataset for training next generation image-text models

Ojaverse: 包含80万左右的3D模型。详见：Objaverse: A universe of annotated 3d objects.

模型训练流程

可以参考normal-depth-diffusion仓库的训练说明

参考文献

[1] Schuhmann C, Beaumont R, Vencu R, et al. Laion-5b: An open large-scale dataset for training next generation image-text models[J]. Advances in Neural Information Processing Systems, 2022, 35: 25278-25294.

[2] Birkl R, Wofk D, Müller M. MiDaS v3. 1--A Model Zoo for Robust Monocular Relative Depth Estimation[J]. arv preprint arv:2307.14460, 2023.

[3] Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. Es- timating and exploiting the aleatoric uncertainty in surface normal estimation. In Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2021. 2, 4, 8, 13.

[4] Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. Es- timating and exploiting the aleatoric uncertainty in surface normal estimation. In Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2021. 2, 4, 8, 13.

[5] Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International conference on machine learning. PMLR, 2021: 8748-8763.

文本生成法向和深度图的扩散模型, 可用于生成3D模型

作品详情