stable-diffusion-3-medium-tensorrt

我要开发同款
匿名用户2024年07月31日
33阅读
所属分类aiPytorch、onnx、text-to-image、sd3-medium、sd3、tensorrt
开源地址https://modelscope.cn/models/cjc1887415157/stable-diffusion-3-medium-tensorrt
授权协议other

作品详情

Stable Diffusion 3 Medium TensorRT

Introduction

This repository hosts the TensorRT version of Stable Diffusion 3 Medium created in collaboration with NVIDIA. The optimized versions give substantial improvements in speed and efficiency.

Stable Diffusion 3 Medium is a fast generative text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

Model Details

Model Description

Stable Diffusion 3 Medium combines a diffusion transformer architecture and flow matching.

  • Developed by: Stability AI
  • Model type: MMDiT text-to-image model
  • Model Description: This is a conversion of the Stable Diffusion 3 Medium model

Performance using TensorRT 10.1

Timings for 50 steps at 1024x1024

Accelerator CLIP-G CLIP-L T5XXL MMDiT VAE Decoder Total
A100 11.95 ms 5.04 ms 21.39 ms 5468.17 ms 72.25 ms 5622.47 ms

Timings for 30 steps at 1024x1024 with input image conditioning

Accelerator VAE Encoder CLIP-G CLIP-L T5XXL MMDiT VAE Decoder Total
A100 37.04 ms 12.07 ms 5.07 ms 21.49 ms 3340.69 ms 72.02 ms 3531.49 ms

Int8 quantization with TensorRT Model Optimizer

The MMDiT in Stable Diffusion 3 Medium can be further optimized with INT8 quantization using TensorRT Model Optimizer. The estimated end-to-end speedup comparing TensorRT fp16 and TensorRT int8 is 1.2x~1.4x on various NVidia GPUs. The memory saving is about 2x for the int8 MMDiT engine compared with the fp16 counterpart. The image quality can be maintained with minimal to negligible degradation.

Usage Example

  1. Follow the setup instructions on launching a TensorRT NGC container.
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd3
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:24.05-py3 /bin/bash
  1. Download the Stable Diffusion 3 Medium TensorRT files from this repo
git lfs install 
git clone https://huggingface.co/stabilityai/stable-diffusion-3-medium-tensorrt
cd stable-diffusion-3-medium-tensorrt
git lfs pull
cd ..
  1. Install libraries and requirements
cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
  1. Perform TensorRT optimized inference:
  • Stable Diffusion 3 Medium

    Works best for 1024x1024 images. The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations.

    python3 demo_txt2img_sd3.py \
      "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \
      --version=sd3 \
      --onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \
      --engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \
      --seed 42 \
      --width 1024 \
      --height 1024 \
      --build-static-batch \
      --use-cuda-graph
    
  • Stable Diffusion 3 Medium with input image conditioning

    Provide an input image conditioning using below. Works best for 1024x1024 but may also work at 512x512.

    wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png -O dog-on-bench.png
    
    python3 demo_txt2img_sd3.py \
      "dog wearing a sweater and a blue collar" \
      --version=sd3 \
      --onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \
      --engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \
      --seed 42 \
      --width 1024 \
      --height 1024 \
      --input-image dog-on-bench.png \
      --build-static-batch \
      --use-cuda-graph
    
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论