The first open source Chinese Stable Diffusion model, based on 20 million screened Chinese image-text pairs for training.
The first open source Chinese Stable diffusion, which was trained on 20M filtered Chinese image-text pairs.
You can experience our model at Taiyi-Stable-Diffusion-Chinese .
We support a Gradio Web UI to run Taiyi-Stable-Diffusion-1B-Chinese-v0.1: Taiyi-Stable-Diffusion-Chinese
We use the Noah-Wukong dataset (100M) and the Zero dataset (23M) as pre-training datasets, and first use IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese for these two datasets The similarity of the image-text pairs is scored, and the image-text pairs with a CLIP Score greater than 0.2 are taken as our training set. We use IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese as the initialized text encoder, freeze other parts of the stable-diffusion-v1-4 model, and only train the text encoder so as to retain the generation of the original model ability and achieve the alignment of Chinese concepts. The model is currently trained for one epoch on 0.2 billion image-text pairs. We trained about 100 hours on 32 x A100. This version is only a preliminary version, we will continue to optimize and open source subsequent models, welcome to communicate.
We use Noah-Wukong(100M) and Zero(23M) as our dataset, and take the image and text pairs with CLIP Score (based on IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) greater than 0.2 as our Training set. We use IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese as our init text encoder. To keep the powerful generative capability of stable diffusion and align Chinese concepts with the images, We only train the text encoder and freeze other part of the stable-diffusion-v1-4 model. It takes 100 hours to train this model based on 32 x A100. This model is a preliminary version and we will update this model continuously and open source. Welcome to exchange!
Visit Official Website