HomeTutorials
【Stable Diffusion Mode】How to generate realistic people in Stable Diffusion
32

【Stable Diffusion Mode】How to generate realistic people in Stable Diffusion

AI Learning  Assistant NO 2
AI Learning Assistant NO 2
July 31st, 2023

One of the most popular uses of Stable Diffusion is to generate realistic people. They can look as real as taken from a camera. In this post, you will learn the mechanics of generating photo-style portrait images. You will learn about prompts, models, and upscalers for generating realistic people.

Software

We will use AUTOMATIC1111 Stable Diffusion GUI to generate realistic people. You can use this GUI on Windows, Mac, or Google Colab.

Prompt

In this section, you will learn how to build a high-quality prompt for realistic photo styles step-by-step.

Let’s start with a simple prompt of a woman sitting outside of a restaurant. Let’s use the v1.5 base model.

Prompt:

photo of young woman, highlight hair, sitting outside restaurant, wearing dress

Model: Stable Diffusion v1.5

Sampling method: DPM++ 2M Karras

Sampling steps: 20

CFG Scale: 7

Size: 512×768

Well, that didn’t go so well…

Negative prompt

Let’s add a negative prompt. This negative prompt is quite minimalistic. It is intended to generate better anatomy and steer away from non-realistic styles.

Negative Prompt:

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

It’s doing something: The women look better. The Upper bodies look pretty good.

But the anatomy of the lower bodies is still problematic. There’s still a lot of room for improvement.

Lighting keywords

A large part of a photographer’s job is to set up good lighting. A good photo has interesting lights. The same applies to Stable Diffusion. Let’s add some lighting keywords and a keyword that controls the viewing angle.

  • rim lighting

  • studio lighting

  • looking at the camera Prompt:

photo of young woman, highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera

Negative prompt:

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

The photos instantly look more interesting. You may notice the anatomy is not quite right. Don’t worry. There are many ways to fix it. I will tell you in the later part of the article.

Camera keywords

Keywords like dslr, ultra quality, 8K, UHD can improve the quality of images.

Prompt:

photo of young woman, highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD

Negative prompt:

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

I cannot say they are definitely better, but it certainly doesn’t hurt to include them…

Facial details

Finally, some keywords can be used as sweeteners to describe eyes and skin. These keywords intend to render a more realistic face.

  • highly detailed glossy eyes

  • high detailed skin

  • skin pores

A side effect of using these keywords is drawing the subject closer to the camera.

Putting them together, we have the following final prompt.

Prompt:

photo of young woman, highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, high detailed skin, skin pores

Negative prompt

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

Are you surprised that the base model can generate these high-quality realistic images? We haven’t even used special photo-realistic models yet. It will only get better.

Controlling faces

Blending two names Do you want to generate the same look across multiple images? One trick is to take advantage of celebrities. Their looks are the most recognizable part of their bodies. So they are guaranteed to be consistent.

But we usually don’t want to use their face. They are just too recognizable. You want a new face with a certain look.

The trick is to blend two faces using prompt scheduling. The syntax in AUTOMATIC1111 is

[person 1: person2: factor]

factor is a number between 0 and 1. It indicates the fraction of the total number of steps when the keyword switches from person 1 to person 2. For example, [Ana de Armas:Emma Watson:0.5] with 20 steps means the prompt uses Ana de Armas in steps 1 – 10, and uses Emma Watson in steps 11-20.

You can simply throw that into the prompt like below.

Prompt:

photo of young woman, [Ana de Armas:Emma Watson:0.5], highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, high detailed skin, skin pores

Negative prompt

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

By carefully adjusting the factor, you can dial in the proportion of the two faces.

Blending one name Did you notice the background and composition have changed drastically when using two names? It’s the association effect. Photos of actresses often associate with certain settings, such as award ceremonies.

The overall composition is set by the first keyword because the sampler denoises most in the first few steps.

Taking advantage of this idea, we can still use woman in the first few steps and only swap in a celebrity name later on. This keeps the composition while offering to blend a generic face with a celebrity.

The prompt is something like this:

photo of young [woman:Ana de Armas:0.4], highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, high detailed skin, skin pores

The negative prompt can stay the same.

disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w

Using this technique, we can keep the composition while controlling the face to some extent.

Inpainting faces Inpainting is a technique to keep both the composition and have total control of the face.

After generating an image in txt2img tab, click on Send to inpainting.

In the inpainting canvas, draw a mask covering the face.

Now modify the prompt to include blending of two faces. E.g.

photo of young [Emma Watson: Ana de Armas: 0.4], highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, high detailed skin, skin pores

Set denoising strength to 0.75 and batch size to 8. Hit Generate and cherry-pick one that works the best.

Fixing defects

You don’t need to generate realistic people with correct anatomy in one shot. It is fairly easy to re-generate part of the image.

Let’s go through an example. The image below looks good, except the arms are deformed.

To fix it, first click on Send to inpaint to send the image and the parameters to the inpainting section of the img2img tab.

In the inpainting canvas of the img2img tab, draw a mask over the problematic area.

Set Seed to -1 (random), denoising strength to 1, and batch size to 8.

You can experiment with the inpaint area setting – whole picture or only masked.

Hit Generate.

You will have some bad ones. But by sheer chance, you should see a decent one. If not, press Generate again.

You don’t need to get to the perfect inpainting in one go. You can refine an image iteratively with inpainting. When you see an image moving in the right direction, press Send to inpaint.

Now you are acting on the new image. Reduce the denoising strength gradually so that it preserves the content of the image. Below is an example of doing a second round of inpainting. The denoising strength was set to 0.6.

To see more content about Stable Diffusion from zero click:https://www.hayo.com/article/64c21001ef669957a0d21e63

Reprinted from View Original

Comments

no dataCoffee time! Feel free to comment