The AI drawing tool developed by Netease Fuxi platform relies on the principle of diffusion model and is trained on a wide range of (800 million) graphic data to achieve better generation results.
Different from the common graph generation method based on the diffusion model, Fuxi's self-developed model also has the following characteristics:
1. Model innovation: The semantic ability of text and image generation is very dependent on the representation ability of user input text, relying on the representation ability of Fuxi's self-developed "Yuzhi" model in the Chinese context, the self-developed generation model can be used in Chinese scenes It has a strong semantic representation ability. In addition, Fuxi's self-developed model also focuses on the interaction between text and pictures, and strengthens the role of parameters in the text-picture guidance part, allowing text to better guide the generation of pictures, so the generated results are closer to the user's intentions.
2. Multi-scale training of pictures: In a wide range of data sets, the self-developed model fully considers the different sizes and clarity of pictures, and divides pictures of different sizes and resolutions into buckets for multi-scale training. On the premise of fully ensuring the undistorted training of training pictures, retain as much information as possible, and the self-developed model can adapt to the generation of different resolutions.
3. Data strategy: Multi-stage training can ensure both the extensiveness of the model and the quality of the generated results. In the initial stage, the use of widely distributed data at the level of 100 million enables the model not only to have a wide range of semantic understanding, but also to understand some idioms and ancient poems well, such as husband and wife lung slices, famous flowers and beautiful countries, etc. At the same time, it also has diversity in the generated painting style, and can generate a variety of styles. In the later stage, data is screened from multiple levels such as image-text correlation, image clarity, and image aesthetics to optimize the generation ability and generate high-quality images.
Super strong semantic understanding ability in the Chinese scene: able to fully understand the user's input and return what the user wants. Especially in the understanding and generation of idioms, sayings, and verses, it has certain advantages.
Visit Official Website