EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
*Equal Contribution.
Terminal Technology Department, Alipay, Ant Group.
Model Files
./pretrained_models/
├── denoising_unet.pth
├── reference_unet.pth
├── motion_module.pth
├── face_locator.pth
├── sd-vae-ft-mse
│ └── ...
├── sd-image-variations-diffusers
│ └── ...
└── audio_processor
└── whisper_tiny.pt
Some models in this hub can be directly downloaded from it's original hub:
- sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
- sd-image-variations-diffusers
- audio_processor
Gallery
Audio Driven (Sing)
Audio Driven (English)
Audio Driven (Chinese)
Landmark Driven
Audio + Selected Landmark Driven
(Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.)
Citation
If you find our work useful for your research, please consider citing the paper:
@misc{chen2024echomimic,
title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
评论