vae-ft-mse-840000-ema-pruned

我要开发同款
匿名用户2024年07月31日
24阅读
所属分类ai、text-to-image、stable-diffusion-dif、stable-diffusion
开源地址https://modelscope.cn/models/lixxxxxx/vae-ft-mse-840000-ema-pruned
授权协议mit

作品详情

Improved Autoencoders

Utilizing

These weights are intended to be used with the original CompVis Stable Diffusion codebase. If you are looking for the model to use with the ? diffusers library, come here.

Decoder Finetuning

We publish two kl-f8 autoencoder versions, finetuned from the original kl-f8 autoencoder on a 1:1 ratio of LAION-Aesthetics and LAION-Humans, an unreleased subset containing only SFW images of humans. The intent was to fine-tune on the Stable Diffusion training set (the autoencoder was originally trained on OpenImages) but also enrich the dataset with images of humans to improve the reconstruction of faces. The first, ft-EMA, was resumed from the original checkpoint, trained for 313198 steps and uses EMA weights. It uses the same loss configuration as the original checkpoint (L1 + LPIPS). The second, ft-MSE, was resumed from ft-EMA and uses EMA weights and was trained for another 280k steps using a different loss, with more emphasis on MSE reconstruction (MSE + 0.1 * LPIPS). It produces somewhat ``smoother'' outputs. The batch size for both versions was 192 (16 A100s, batch size 12 per GPU). To keep compatibility with existing models, only the decoder part was finetuned; the checkpoints can be used as a drop-in replacement for the existing autoencoder..

Original kl-f8 VAE vs f8-ft-EMA vs f8-ft-MSE

Evaluation

COCO 2017 (256x256, val, 5000 images)

Model train steps rFID PSNR SSIM PSIM Link Comments
original 246803 4.99 23.4 +/- 3.8 0.69 +/- 0.14 1.01 +/- 0.28 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip as used in SD
ft-EMA 560001 4.42 23.8 +/- 3.9 0.69 +/- 0.13 0.96 +/- 0.27 https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt slightly better overall, with EMA
ft-MSE 840001 4.70 24.5 +/- 3.7 0.71 +/- 0.13 0.92 +/- 0.27 https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs

LAION-Aesthetics 5+ (256x256, subset, 10000 images)

Model train steps rFID PSNR SSIM PSIM Link Comments
original 246803 2.61 26.0 +/- 4.4 0.81 +/- 0.12 0.75 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip as used in SD
ft-EMA 560001 1.77 26.7 +/- 4.8 0.82 +/- 0.12 0.67 +/- 0.34 https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt slightly better overall, with EMA
ft-MSE 840001 1.88 27.3 +/- 4.7 0.83 +/- 0.11 0.65 +/- 0.34 https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs

Visual

Visualization of reconstructions on 256x256 images from the COCO2017 validation dataset.


256x256: ft-EMA (left), ft-MSE (middle), original (right)

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论