DreamTalk is a diffusio-based audio-drive expressive talkig head geeratio framework that ca produce high-quality talkig head videos across diverse speakig styles. DreamTalk exhibits robust performace with a diverse array of iputs, icludig sogs, speech i multiple laguages, oisy audio, ad out-of-domai portraits. 我在 如果输入图片已经为$256\times256$ 而且大小合适无需裁剪, 则可使用 Dowload the checkpoit of the deoisig etwork: Dowload the checkpoit of the rederer: Put the dowloaded checkpoits ito Ru the script: The geerated video will be amed Sample iputs are preseted i We exted our heartfelt thaks for the ivaluable cotributios made by precedig works to the developmet of DreamTalk. This icludes, but is ot limited to:
PIRederer
,AVCT
,StyleTalk
,Deep3DFaceReco_pytorch
,Wav2vec2.0
,diffusio-poit-cloud
,FOMM video preprocessig. We are dedicated to advacig upo these foudatioal works with the utmost respect for their origial cotributios. If you fid this codebase useful for your research, please use the followig etry.DreamTalk: Whe Expressive Talkig Head Geeratio Meets Diffusio Probabilistic Models
News
安装依赖
pip istall dlib
Istallatio
output_video
文件夹下已经放入了一些生成好的文件, 可运行下面脚本, 对比下结果.from modelscope.utils.costat import Tasks
from modelscope.pipelies import pipelie
import os
pipe = pipelie(task=Tasks.text_to_video_sythesis, model='damo/dreamtalk',
style_clip_path="data/style_clip/3DMM/M030_frot_surprised_level3_001.mat",
pose_path="data/pose/RichardShelby_frot_eutral_level1_001.mat",
model_revisio='master'
)
# ,model_revisio='master')
iputs={
"output_ame": "sogbie_yk_male",
"wav_path": "data/audio/ackowledgemet_eglish.m4a",
"img_crop": True,
"image_path": "data/src_img/ucropped/male_face.pg",
"max_ge_le": 20
}
pipe(iput=iputs)
prit("ed")
wav_path
为输入音频路径; style_clip_path
为表情参考文件,从带情绪的视频中提取, 可用来控制生成视频的表情; pose_path
为头部运动参考文件, 从视频中提取,可用来控制生成视频的头部运动; image_path
为说话人肖像, 最好是正脸, 理论支持任意分辨率输入, 会被裁减成$256\times256$ 分辨率; max_ge_le
为最长视频生成时长, 单位为秒, 如果输入音频长于这个时间则会被截断; output_ame
为输出名称, 最终生成的视频会在 output_video
文件夹下, 中间结果会在 tmp
文件夹下.disable_img_crop
跳过裁剪步骤, 如下:Dowload Checkpoits
checkpoits
folder.Iferece
pytho iferece_for_demo_video.py \
--wav_path data/audio/ackowledgemet_eglish.m4a \
--style_clip_path data/style_clip/3DMM/M030_frot_eutral_level1_001.mat \
--pose_path data/pose/RichardShelby_frot_eutral_level1_001.mat \
--image_path data/src_img/ucropped/male_face.pg \
--cfg_scale 1.0 \
--max_ge_le 30 \
--output_ame ackowledgemet_eglish@M030_frot_eutral_level1_001@male_face
wav_path
specifies the iput audio. The iput audio file extesios such as wav, mp3, m4a, ad mp4 (video with soud) should all be compatible.style_clip_path
specifies the referece speakig style ad pose_path
specifies head pose. They are 3DMM parameter sequeces extracted from referece videos. You ca follow PIRederer to extract 3DMM parameters from your ow videos. Note that the video frame rate should be 25 FPS. Besides, videos used for head pose referece should be first cropped to $256\times256$ usig scripts i FOMM video preprocessig.image_path
specifies the iput portrait. Its resolutio should be larger tha $256\times256$. Frotal portraits, with the face directly facig forward ad ot tilted to oe side, usually achieve satisfactory results. The iput portrait will be cropped to $256\times256$. If your portrait is already cropped to $256\times256$ ad you wat to disable croppig, use optio --disable_img_crop
like this:pytho iferece_for_demo_video.py \
--wav_path data/audio/ackowledgemet_chiese.m4a \
--style_clip_path data/style_clip/3DMM/M030_frot_surprised_level3_001.mat \
--pose_path data/pose/RichardShelby_frot_eutral_level1_001.mat \
--image_path data/src_img/cropped/zp1.pg \
--disable_img_crop \
--cfg_scale 1.0 \
--max_ge_le 30 \
--output_ame ackowledgemet_chiese@M030_frot_surprised_level3_001@zp1
cfg_scale
cotrols the scale of classifer-free guidace. It ca adjust the itesity of speakig styles.max_ge_le
is the maximum video geeratio duratio, measured i secods. If the iput audio exceeds this legth, it will be trucated.$(output_ame).mp4
ad put i the output_video folder. Itermediate results, icludig the cropped portrait, will be i the tmp/$(output_ame)
folder.data
folder. Due to copyright issues, we are uable to iclude the sogs we have used i this folder.Ackowledgemets
Citatio
@article{ma2023dreamtalk,
title={DreamTalk: Whe Expressive Talkig Head Geeratio Meets Diffusio Probabilistic Models},
author={Ma, Yifeg ad Zhag, Shiwei ad Wag, Jiayu ad Wag, Xiag ad Zhag, Yigya ad Deg, Zhidog},
joural={arXiv preprit arXiv:2312.09767},
year={2023}
}
点击空白处退出提示
评论