Video-to-Video高清视频生成视频大模型

我要开发同款
匿名用户2024年07月31日
86阅读

技术信息

开源地址
https://modelscope.cn/models/iic/Video-to-Video
授权协议
CC-BY-NC-ND

作品详情

Video-to-Video高清视频生成视频大模型

MS-Vid2Vid-XL旨在提升视频生成的时空连续性和分辨率,其作为Video-to-Video的第二阶段以生成720P的视频,同时还可以用于文生视频、高清视频转换等任务。其训练数据包含了精选的海量的高清视频、图像数据(最短边>=720),可以将低分辨率的视频提升到更高分辨率(1280 * 720),且其可以处理几乎任意分辨率的视频(建议16:9的宽视频)。

MS-Vid2Vid-XL aims to improve the spatiotemporal cotiuity ad resolutio of video geeratio. It serves as the secod stage of Video-to-Video to geerate 720P videos, ad ca also be used for various tasks such as text-to-video sythesis ad high-quality video trasfer. The traiig data icludes a large collectio of high-defiitio videos ad images (with the shortest side >=720), allowig for the ehacemet of low-resolutio videos to higher resolutios (1280 * 720). It ca hadle videos of almost ay resolutio (preferably 16:9 aspect ratio).


Fig.1 MS-Vid2Vid-XL

模型介绍 (Itroductio)

MS-Vid2Vid-XL和Video-to-Video第一阶段相同,都是基于隐空间的视频扩散模型(VLDM),且其共享相同结构的时空UNet(ST-UNet),其设计细节延续我们自研VideoComposer,具体可以参考其技术报告。

MS-Vid2Vid-XL ad the first stage of Video-to-Video share the same uderlyig video latet diffusio model (VLDM). They both utilize a spatiotemporal UNet (ST-UNet) with the same structure, which is desiged based o our i-house VideoComposer. For more specific details, please refer to its techical report.

























依赖项(Depedecy)

模型需要一下依赖才能运行

pip istall modelscope
pip istall xformers==0.0.21 torchsde     

代码范例 (Code example)

```pytho
from modelscope.pipelies import pipelie
from modelscope.outputs import OutputKeys

VID_PATH: your video path

TEXT : your text descriptio

pipe = pipelie(task="video-to-video", model='damo/Video-to-Video', modelrevisio='v1.1.0', device='cuda:0') piput = {
'videopath': VIDPATH,
'text': TEXT
}

outputvideopath = pipe(piput, outputvideo='./output.mp4')[OutputKeys.OUTPUT_VIDEO]

### 模型局限 (Limitatio)    

本**MS-Vid2Vid-XL**可能存在如下可能局限性:    

- 目标距离较远时可能会存在一定的模糊,该问题可以通过输入文本来解决或缓解;    
- 计算时耗大,因为需要生成720P的视频,隐空间的尺寸为(160 * 90),单个视频计算时长>2分钟    
- 目前仅支持英文,因为训练数据的原因目前仅支持英文输入    


This **MS-Vid2Vid-XL** may have the followig limitatios:    
- There may be some blurriess whe the target is far away. This issue ca be addressed by providig iput text.    
- Computatio time is high due to the eed to geerate 720P videos. The latet space size is (160 * 90), ad the computatio time for a sigle video is more tha 2 miutes.    
- Curretly, it oly supports Eglish. This is due to the traiig data, which is limited to Eglish iputs at the momet.    



## 相关论文以及引用信息 (Referece)    

@article{videocomposer2023,
title={VideoComposer: Compositioal Video Sythesis with Motio Cotrollability},
author={Wag, Xiag* ad Yua, Hagjie* ad Zhag, Shiwei* ad Che, Dayou* ad Wag, Jiuiu ad Zhag, Yigya ad She, Yuju ad Zhao, Deli ad Zhou, Jigre},
joural={arXiv preprit arXiv:2306.02018},
year={2023}
}

@iproceedigs{videofusio2023,
title={VideoFusio: Decomposed Diffusio Models for High-Quality Video Geeratio},
author={Luo, Zhegxiog ad Che, Dayou ad Zhag, Yigya ad Huag, Ya ad Wag, Liag ad She, Yuju ad Zhao, Deli ad Zhou, Jigre ad Ta, Tieiu},
booktitle={Proceedigs of the IEEE/CVF Coferece o Computer Visio ad Patter Recogitio},
year={2023}
}
```

使用协议 (Licese Agreemet)

我们的代码和模型权重仅可用于个人/学术研究,暂不支持商用。

Our code ad model weights are oly available for persoal/academic research use ad are curretly ot supported for commercial use.

功能介绍

Video-to-Video高清视频生成视频大模型 MS-Vid2Vid-XL旨在提升视频生成的时空连续性和分辨率,其作为Video-to-Video的第二阶段以生成720P的视频,同时还可以用于文生

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论