The first open source Chinese Stable Diffusion model, based on 20 million screened Chinese image-text pairs for training.

Experience Gradio Web UI online

You can experience our model at Taiyi-Stable-Diffusion-Chinese .

We use the Noah-Wukong dataset (100M) and the Zero dataset (23M) as pre-training datasets, and first use IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese for these two datasets The similarity of the image-text pairs is scored, and the image-text pairs with a CLIP Score greater than 0.2 are taken as our training set. We use IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese as the initialized text encoder, freeze other parts of the stable-diffusion-v1-4 model, and only train the text encoder so as to retain the generation of the original model ability and achieve the alignment of Chinese concepts. The model is currently trained for one epoch on 0.2 billion image-text pairs. We trained about 100 hours on 32 x A100. This version is only a preliminary version, we will continue to optimize and open source subsequent models, welcome to communicate.

