VideoComposer

我要开发同款
匿名用户2024年07月31日
34阅读
所属分类aipytorch、生成、文本生成视频、文生视频、文到视频、diffusion model、text2video generatio、text-video similarit、realism
开源地址https://modelscope.cn/models/iic/VideoComposer
授权协议CC-BY-NC-ND

作品详情

VideoComposer

Official repo for VideoComposer: Compositional Video Synthesiswith with Motion Controllability

Please see Project Page for more examples.

We are searching for talented, motivated, and imaginative researchers to join our team. If you are interested, please don't hesitate to send us your resume via email yingya.zyy@alibaba-inc.com

figure1

VideoComposer is a controllable video diffusion model, which allows users to flexibly control the spatial and temporal patterns simultaneously within a synthesized video in various forms, such as text description, sketch sequence, reference video, or even simply handcrafted motions and handrawings.

TODO

  • [x] Release our technical papers and webpage.
  • [x] Release code and pretrained model.
  • [ ] Release Gradio UI on ModelScope and Hugging Face.
  • [ ] Release pretrained model that can generate 8s videos without watermark.

Method

method

Running by Yourself

1. Installation

Requirements:

  • Python==3.8
  • ffmpeg (for motion vector extraction)
  • torch==1.12.0+cu113
  • torchvision==0.13.0+cu113
  • open-clip-torch==2.0.2
  • transformers==4.18.0
  • flash-attn==0.2
  • xformers==0.0.13
  • motion-vector-extractor==1.0.6 (for motion vector extraction)

You also can create a same environment like ours with the following command:

conda env create -f environment.yaml

2. Download model weights

Download all the [model weights](), then place them in the following folder:

```python !pip install modelscope from modelscope.hub.snapshotdownload import snapshotdownload modeldir = snapshotdownload('damo/VideoComposer', cachedir='modelweights/', revision='v1.0.0')


|--modelweights/ | |--nonema228000.pth | |--midasv3dptlarge.pth | |--openclippytorchmodel.bin | |--sketchsimplificationgan.pth | |--table5pidinet.pth | |--v2-1_512-ema-pruned.ckpt

You can also download the some of them from their original project: 
- "midas_v3_dpt_large.pth" in [MiDaS](https://github.com/isl-org/MiDaS)
- "open_clip_pytorch_model.bin" in [Open Clip](https://github.com/mlfoundations/open_clip) 
- "sketch_simplification_gan.pth" and "table5_pidinet.pth" in [Pidinet](https://github.com/zhuoinoulu/pidinet)
- "v2-1_512-ema-pruned.ckpt" in [Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-2-1-base/blob/main/v2-1_512-ema-pruned.ckpt).

For convenience, we provide a download link in this repo.


### 3. Running

In this project, we provide two implementations that can help you better understand our method.


#### 3.1 Inference with Customized Inputs

You can run the code with following command:

python runnet.py\ --cfg configs/exp02motiontransfer.yaml\ --seed 9999\ --inputvideo "demovideo/motiontransfer.mp4"\ --imagepath "demovideo/moononwater.jpg"\ --inputtextdesc "A beautiful big moon on the water at night"

The results are saved in the `outputs/exp02_motion_transfer-S09999` folder:

![case1](source/results/exp02_motion_transfer-S00009.gif "case2")
![case2](source/results/exp02_motion_transfer-S09999.gif "case2")


In some cases, if you notice a significant change in color difference, you can use the style condition to adjust the color distribution with the following command. This can be helpful in certain cases.

python runnet.py\ --cfg configs/exp02motiontransfervsstyle.yaml\ --seed 9999\ --inputvideo "demovideo/motiontransfer.mp4"\ --imagepath "demovideo/sunflower.png"\ --styleimage "demovideo/sunflower.png"\ --inputtextdesc "A sunflower in a field of flowers"


python runnet.py\ --cfg configs/exp03sketch2videostyle.yaml\ --seed 8888\ --sketchpath "demovideo/srcsinglesketch.png"\ --styleimage "demovideo/style/qibaishi01.png"\ --inputtextdesc "Red-backed Shrike lanius collurio"

![case2](source/results/exp03_sketch2video_style-S09999.gif "case2")

python runnet.py\ --cfg configs/exp04sketch2videowostyle.yaml\ --seed 144\ --sketchpath "demovideo/srcsinglesketch.png"\ --inputtextdesc "A Red-backed Shrike lanius collurio is on the branch"

![case2](source/results/exp04_sketch2video_wo_style-S00144.gif "case2")
![case2](source/results/exp04_sketch2video_wo_style-S00144-1.gif "case2")

python runnet.py\ --cfg configs/exp05textdepthswostyle.yaml\ --seed 9999\ --inputvideo demovideo/video8800.mp4\ --inputtextdesc "A glittering and translucent fish swimming in a small glass bowl with multicolored piece of stone, like a glass fish"

![case2](source/results/exp05_text_depths_wo_style-S09999-0.gif "case2")
![case2](source/results/exp05_text_depths_wo_style-S09999-2.gif "case2")

python runnet.py\ --cfg configs/exp06textdepthsvsstyle.yaml\ --seed 9999\ --inputvideo demovideo/video8800.mp4\ --styleimage "demovideo/style/qibaishi01.png"\ --inputtext_desc "A glittering and translucent fish swimming in a small glass bowl with multicolored piece of stone, like a glass fish"

![case2](source/results/exp06_text_depths_vs_style-S09999-0.gif "case2")
![case2](source/results/exp06_text_depths_vs_style-S09999-1.gif "case2")


#### 3.2 Inference on a Video

You can just runing the code with the following command:

python runnet.py \ --cfg configs/exp01vidcomposerfull.yaml \ --inputvideo "demovideo/blackswan.mp4" \ --inputtext_desc "A black swan swam in the water" \ --seed 9999

This command will extract the different conditions, e.g., depth, sketch, motion vectors, of the input video for the following video generation, which are saved in the `outputs` folder. The task list are predefined in <font style="color: rgb(128,128,255)">inference_multi.py</font>. 



In addition to the above use cases, you can explore further possibilities with this code and model. Please note that due to the diversity of generated samples by the diffusion model, you can explore different seeds to generate better results. 

We hope you enjoy using it! &#x1F600; 



## BibTeX

If this repo is useful to you, please cite our technical paper:

bibtex @article{2023videocomposer, title={VideoComposer: Compositional Video Synthesis with Motion Controllability}, author={Wang, Xiang* and Yuan, Hangjie* and Zhang, Shiwei* and Chen, Dayou* and Wang, Jiuniu, and Zhang, Yingya, and Shen, Yujun, and Zhao, Deli and Zhou, Jingren}, booktitle={arXiv preprint arXiv:2306.02018}, year={2023} } ```

Acknowledgement

We would like to express our gratitude for the contributions of several previous works to the development of VideoComposer. This includes, but is not limited to Composer, ModelScopeT2V, Stable Diffusion, OpenCLIP, WebVid-10M, LAION-400M, Pidinet and MiDaS. We are committed to building upon these foundations in a way that respects their original contributions.

Disclaimer

Note: This open-source model is trained on the WebVid-10M and LAION-400M datasets and is intended for PERSONAL/RESEARCH/NON-COMMERCIAL USE ONLY. We have also trained more powerful models using internal video data, which can be used in future.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论