模型描述 (Model Description)
PixArt-α是一种基于Transformer的文生图(T2I)扩散模型,其图像生成质量可与最先进的图像生成器(例如Imagen、SDXL甚至Midjourney)相媲美。更多详情可参照主页
模型结构图和样例结果展示如下图所示
运行环境 (Operating environment)
Dependencies and Installation
# Create a conda environment and activate it
conda create -n pixart python==3.9.0
conda activate pixart
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
# git clone the original repository
git clone https://github.com/PixArt-alpha/PixArt-alpha.git
cd PixArt-alpha
# Install from requirements.txt
pip install -r requirements.txt
代码范例 (Code example)
参数说明:只需提供prompt,即可完成图像生成任务
from modelscope.pipelines import pipeline
input = {'prompt': 'A small cactus with a happy face in the Sahara desert.'}
inference = pipeline('my-pixart-task', model='aojie1997/cv_PixArt-alpha_text-to-image')
output = inference(input)
output.save('./result.png')
Citation
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@misc{chen2023pixartalpha,
title={PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis},
author={Junsong Chen and Jincheng Yu and Chongjian Ge and Lewei Yao and Enze Xie and Yue Wu and Zhongdao Wang and James Kwok and Ping Luo and Huchuan Lu and Zhenguo Li},
year={2023},
eprint={2310.00426},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{chen2024pixartdelta,
title={PIXART-{\delta}: Fast and Controllable Image Generation with Latent Consistency Models},
author={Junsong Chen and Yue Wu and Simian Luo and Enze Xie and Sayak Paul and Ping Luo and Hang Zhao and Zhenguo Li},
year={2024},
eprint={2401.05252},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
评论