【Beginner's Guide to Stable Diffusion】Negative prompt is important for v2 models

AI Learning  Assistant NO 2
July 28th, 2023

Negative prompt with Stable Diffusion v2.1

Consistent with Max Woolf’s finding, my own experience is that negative prompt is very important for v2 models. Below I used the positive prompt for generating realistic humans but with Stable Diffusion 2.1 model.

a young female, highlights in hair, sitting outside restaurant, brown eyes, wearing a dress, side light

Adding just two to three negative prompts progressively improves the aesthetic of the images. I would say this is pretty near the quality of v1 models.

Negative prompt with Stable Diffusion v1.5

Let’s repeat the exercise on v1.5 model.

The images comes out pretty good without any negative prompts in v1.5. Adding the negative prompt ugly, deformed and disfigured may improve things but it is not as clear as in v2.1. It is as if v1.5 model does not understand these words.

Why does negative prompt become more important in v2?

This is an area I can only speculate… but why not? The two changes in v2 are

  1. Use a larger OpenCLIP language model.

  2. Filtered out NSFW contents in training data.

The first suspect is switching from Open AI’s CLIP model to OpenCLIP. This affects the embeddings of the model. Open AI trained the CLIP model with proprietary data. If the data is highly curated that every single person looks way above average, prompting “woman” would be the same as prompting “beautiful woman”. That would make prompting easier.

My second speculation is that what are deemed NSFW could also be highly aesthetic. It could be a failure of the filter, or its just be the nature of the NSFW images. Excluding NSFW images also unintentionally biases the data towards the bad and ugly.

