playground-v2-1024px-aesthetic

我要开发同款
匿名用户2024年07月31日
31阅读
所属分类aiPytorch、playground、text-to-image
开源地址https://modelscope.cn/models/AI-ModelScope/playground-v2-1024px-aesthetic
授权协议other

作品详情

Playground v2 – 1024px Aesthetic Model

This repository contains a model that generates highly aesthetic images of resolution 1024x1024. You can use the model with Hugging Face ? Diffusers.

image/png

Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at Playground.

Images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL, according to Playground’s user study.

We are thrilled to release intermediate checkpoints at different training stages, including evaluation metrics, to the community. We hope this will encourage further research into foundational models for image generation.

Lastly, we introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality.

Please see our blog for more details.

Model Description

Using the model with ? Diffusers

Install diffusers >= 0.24.0 and some dependencies:

pip install transformers accelerate safetensors

To use the model, run the following snippet.

Note: It is recommend to use guidance_scale=3.0.

from diffusers import DiffusionPipeline
import torch
from modelscope import snapshot_download

local_model = snapshot_download("AI-ModelScope/playground-v2-1024px-aesthetic",revision='master')

pipe = DiffusionPipeline.from_pretrained(
    local_model,
    torch_dtype=torch.float16,
    use_safetensors=True,
    add_watermarker=False,
    variant="fp16"
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipe(prompt=prompt, guidance_scale=3.0).images[0]

image

Using the model with Automatic1111/ComfyUI

In order to use the model with software such as Automatic1111 or ComfyUI you can use playground-v2.fp16.safetensors file.

User Study

image/png

According to user studies conducted by Playground, involving over 2,600 prompts and thousands of users, the images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL.

We report user preference metrics on PartiPrompts, following standard practice, and on an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse and covers various categories and tasks.

During the user study, we give users instructions to evaluate image pairs based on both (1) their aesthetic preference and (2) the image-text alignment.

MJHQ-30K Benchmark

image/png

Model Overall FID
SDXL-1-0-refiner 9.55
playground-v2-1024px-aesthetic 7.07

We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.

We have curated a high-quality dataset from Midjourney, featuring 10 common categories, with each category containing 3,000 samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.

For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024x1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ-30K benchmark.

We release this benchmark to the public and encourage the community to adopt it for benchmarking their models’ aesthetic quality.

Intermediate Base Models

Model FID Clip Score
SDXL-1-0-refiner 13.04 32.62
playground-v2-256px-base 9.83 31.90
playground-v2-512px-base 9.55 32.08

Apart from playground-v2-1024px-aesthetic, we release intermediate checkpoints at different training stages to the community in order to foster foundation model research in pixels. Here, we report the FID score and CLIP score on the MSCOCO14 evaluation set for the reference purposes. (Note that our reported numbers may differ from the numbers reported in SDXL’s published results, as our prompt list may be different.)

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论