匿名用户2024年07月31日
23阅读
所属分类aipytorch、kolors、stable-diffusion、text-to-image
开源地址https://modelscope.cn/models/AI-ModelScope/Kolors
授权协议apache-2.0

作品详情

Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis


? Introduction

Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and proprietary models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this technical report.

? Quick Start

Requirements

  • Python 3.8 or later
  • PyTorch 1.13.1 or later
  • Transformers 4.26.1 or later
  • Recommended: CUDA 11.7 or later
  1. Repository cloning and dependency installation
git clone https://github.com/Kwai-Kolors/Kolors
cd Kolors
pip install -r requirements.txt
python setup.py install
  1. Weights download(link):
modelscope download --model=AI-ModelScope/Kolors --local_dir weights/Kolors

or

git clone https://www.modelscope.cn/AI-ModelScope/Kolors.git
  1. Inference:
python scripts/sample.py "一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着“可图”"
# The image will be saved to "scripts/outputs/sample_test.jpg"

? License&Citation

License

Kolors are fully open-sourced for academic research. For commercial use, please fill out this questionnaire and sent it to kwai-kolors@kuaishou.com for registration.

We open-source Kolors to promote the development of large text-to-image models in collaboration with the open-source community. The code of this project is open-sourced under the Apache-2.0 license. We sincerely urge all developers and users to strictly adhere to the open-source license, avoiding the use of the open-source model, code, and its derivatives for any purposes that may harm the country and society or for any services not evaluated and registered for safety. Note that despite our best efforts to ensure the compliance, accuracy, and safety of the data during training, due to the diversity and combinability of generated content and the probabilistic randomness affecting the model, we cannot guarantee the accuracy and safety of the output content, and the model is susceptible to misleading. This project does not assume any legal responsibility for any data security issues, public opinion risks, or risks and liabilities arising from the model being misled, abused, misused, or improperly utilized due to the use of the open-source model and code.

Citation

If you find our work helpful, please cite it!

@article{kolors,
  title={Kolors Technical Report},
  author={},
  journal={arXiv preprint arXiv:},
  year={2024}
}

Acknowledgments

  • Thanks to Diffusers for providing the codebase.
  • Thanks to ChatGLM3 for providing the powerful Chinese language model.

Contact Us

If you want to leave a message for our R&D team and product team, feel free to join our WeChat group. You can also contact us via email (kwai-kolors@kuaishou.com).

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论