WebUI plug-in: ControlNet tutorial
By carefully studying the usage of each model, and repeatedly testing the usage of each model, we try our best to simulate various situations that you may encounter during use. With nearly a hundred pictures, I hope it can be helpful to everyone. help.
Preconditions
Before watching this tutorial, please make sure you have completed the installation of ControlNet
related environment and Stable Diffusion
.
If you can see the ControlNet
component in the main interface of Stable Diffusion
as shown on the right, then you can read normally.
If it fails to display properly, please try to reinstall ControlNet
. For installation instructions, you can refer to the ControlNet
section of the article: https://www.hayo.com/article/640f115c28c84d3daa01ed4f .

Common functions
The following is an introduction to the common functions of the ControlNet
menu when no model is selected, which can be understood by yourself.

enable
After checking, when you click
生成
button, the image will be generated throughControlNet
in real time, otherwise it will not take effect.
Invert Input Color
Inverts the color of the area you painted with the brush.
RGB to BGR
Color channel inversion
Low VRAM
Low memory mode If your graphics card memory is less than 4GB, it is recommended to check this option.
Guess Mode
Guess (blind box) mode, no positive and negative prompts are required, and the effect of the picture is random. Remarks: After testing on this site, the effect of the blind box is excellent, and it is very likely to produce unexpected surprise effects!
Preprocessor
This list is for model selection, and each
ControlNet
model has different functions, which will be introduced separately later.
Model
The model selection in this list must be consistent with the model name in the preprocessing option box.
If the preprocessing is not consistent with the model, the graph can be produced, but the effect is unpredictable and not ideal.
Weight
Weight, which represents the influence of the weight ratio of the image generated by
ControlNet
.
Guidance strength(T)
The Chinese name is
引导强度
. Before understanding this function, we should first know the step number function to generate a picture. The number of steps represents how many times it is necessary to refresh and calculate a picture.If the number of generated steps you set is 20, and the guidance intensity is set to 1, it means that each step in the 20 steps will be guided by
ControlNet
once. Personally, I think the guidance intensity value is 1, which has the best effect.
Resize Mode
Adjust image size mode: the default is to zoom to the appropriate size, and the image will be automatically adapted.
Canvas Width and Canvas Height
Canvas width and height: Please note that the width and height here does not refer to the image aspect ratio of the image generated by SD.
The width and height represent the ratio used by
ControlNet
to guide the image. If the image you generate with SD has a resolution of 1000_2000, then when usingControlNet
to guide the image, the consumption of video memory will be very large; we can The resolution is set to 500_1000, that is, it is scaled to half the resolution of your original image for guidance, which is beneficial to save video memory consumption.
Create blank canvas
If you have used
ControlNet
function before, there will be historical pictures in the image area ofControlNet
. Click this button to clear the previous history, that is, create a blank canvas.
Preview annotator result
Click this button to preview the annotator results
For example: If you use Canny as preprocessing and model, after clicking this button, you can see a picture of edge lines extracted through the Canny model.
How to: If you use OpenPose as preprocessing and model, then after clicking this button, you can see a picture of the human skeleton extracted through the OpenPose model.
It doesn't matter if you don't understand it now, the function will be supplemented in detail later.
Hide annotator result
Hide the preview image window generated by the Preview button [recommended not to hide]
Model function description
The following is a description of the model-specific functions, and now it is only necessary to do an understanding, and we will explain them one by one below.

Preparation Phase
Before starting ControlNet
tutorial, this site first uses the LoRA model to generate a character. This character picture will serve as the basis for all subsequent tutorial steps in this tutorial. However, this tutorial has nothing to do with LoRA knowledge points. If you use the SD official model to perform the following Tutorials can also be completed .
Basic diagram generation
The basic LoRA model used in this site is ChilloutMix-NI, and the character model is Korean Doll Likeness. The following is the prompt:
lora:koreanDollLikeness_v15:0.66 , best quality, ultra high res, (photorealistic:1.4), 1 girl, (aegyo sal:1), Kpop idol, sitting down, spread legs, sports bra, miniskirt, black hair, (braided hair) , full body, cute, smile, ((puffy eyes)), facing front, (facing viewer), see through, thin waist, huge breasts, armpits, arms up, ulzzang-6500:1Negative prompt: paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spotSize: 512x1024, Seed : 2382839894, Model: chilloutmixni, Steps: 28, Sampler: DPM++ SDE Karras, CFG scale: 7, Model hash: 7234b76e42

tips
If you want to send the generated picture to ControlNet
for processing, press and hold the left mouse button on the generated picture and don’t let go, drag it to ControlNet
component area and release the mouse to quickly place the picture in the ControlNet
area.

Explanation of functions of each model
Note: In the functional explanation of the functional model, this article will introduce the first Canny in great detail. For the explanation of other model functions later, the content that has been explained in the Canny part will not be elaborated too much.
Canny
Algorithm introduction
The main function of the Canny model is to extract and generate line drafts, and perform secondary drawing through line drafts.
Let's load the Canny model first and take a look at the UI interface.

Parameter explanation
This site will translate and explain the functions of the newly added parameters, and the existing parts will not be repeated.
Annotator resolution
Canny line art resolution: The higher the value, the finer the preprocessed image generated by the
ControlNet
area.
Canny low threshold
The minimum threshold of the generated line art, which is to control the minimum sampling depth.
Canny high threshold
The highest threshold of the generated line art, which is to control the highest sampling depth.
First, we place the basic picture in ControlNet
area, and then do not adjust any parameters. After checking the启用
box in ControlNet
area, click Preview annotator result
button to try the effect.

After we click Preview annotator result
button, we can see that a line draft has been generated in ControlNet
area, and this line draft is actually quite perfect.
Annotator resolution
Next, we will keep other parameters as default, only adjust the Annotator resolution parameter to 1024, and then click Preview annotator result
button again to test the comparison effect.

When the Annotator resolution parameter is 512, the generated 512*1024 line drawing.

When the Annotator resolution parameter is 1024, the generated 1024*2048 line draft image.
By comparison, we may find that after the resolution is increased, there are fewer details, and the line draft of the 1024 resolution picture on the right becomes blurred instead.
Detail reduction
This site guesses that it may be because the resolution has been improved. It can be seen that although the details of the extracted lines are reduced, they are more accurate. Therefore, after increasing the resolution of the line draft, the high and low thresholds should be adjusted accordingly.
Line drawing becomes blurry
Because the width of the webpage is not enough, the edges of the large picture will be blurred after zooming in. Actually, there is no problem in opening the picture locally on the computer and zooming to 100% to view it.
About thresholds
Regarding the impact of Canny’s high and low thresholds on the line draft, please test it yourself, because the threshold will be adjusted continuously due to the content of your pictures and your personal follow-up use needs, so this site will not elaborate too much, this site only uses the default parameter line Manuscript explanation.
Show results
After we have the line draft, we can generate pictures based on the line draft, just in time for us to test how盲盒模式(Guess Mode)
works.

original image

After comparison, we can see that the blind box effect is very nice! Including the adjustment of light and shadow, adding bangs, shorts color, background, and even painted a sweet schoolgirl makeup.
The model of opening the blind box is really addictive. The following pictures are randomly clicked six times, and the light and shadow will be automatically adjusted accordingly, which is very good.






Depth
Algorithm introduction
The main function of the Depth model is to capture the depth of the picture and obtain the front-background relationship of the picture.
Tips: If you have the relevant knowledge of 3D animation, then you should know the depth map. There are only black and white two colors in the image. The lighter the color of the image, the closer the distance to the lens; the darker the image, the closer the area is The farther the lens is.
Let's load the Depth model first and take a look at the UI interface.

Parameter explanation
This site will translate and explain the functions of the newly added parameters, and the existing parts will not be repeated.
Midas Resolution
Resolution: After the value is increased, the video memory of this site will collapse, and the default 384 is also good.
Threshold A
Threshold A: cannot be adjusted.
Threshold B
Threshold B: cannot be adjusted.
Effect demonstration
The middle is the generated depth map, and the left and right are the renderings generated based on the depth map.
As you can see, this light and shadow is really nice. Still, I don't see how much it has to do with depth.



This site didn’t give up. I changed a LoRA character and continued to generate three pictures before I discovered the clue. Compared with Canny, after Canny extracts the line draft, the line draft will be drawn according to the line structure, and the background will change; but Depth will The image will be generated strictly according to the generated depth map. Whether it is indoor or outdoor, related objects will be generated in places similar to the original image's front and back background position information.



Depth_Leres
Algorithm introduction
This is a variant of Depth, it is said that it will have a better effect than Depth, and the deformation will be smoother.
But when I was researching on this site, I didn't figure it out. The generated images are very unpredictable. If you have friends who know, welcome to add.
Hed
Basic introduction
The Hed model is also an edge detection algorithm, which is similar to Canny's edge extraction, but Canny can be understood as using a pencil to extract edges, while the Hed algorithm uses a brush, and the extracted edges will be very soft.
Parameter explanation
Regarding the interface of Hed, it is basically consistent with Canny, so this section will not elaborate.
Effect demonstration
The following is the edge extracted from the original image according to the two algorithms. Through comparison, we can see that the difference is quite large.

original image

Canny

Hed
The following is a picture generated by the Hed algorithm. It can be seen that the edge is indeed softer after it is generated.
Therefore, it is recommended to use Canny if it is to generate sharp edges and corners, or machinery; if it is an animal with hair, it may be better to use Hed.



MLSD
Algorithm introduction
This model algorithm can perform very good detection on angular buildings, but the edge extraction effect on people or other curved objects is extremely poor, which is very friendly to architectural designers.
Parameter explanation

Hough Resolution
Resolution: It is the same as the usage of other algorithms and will not be repeated here.
Hough value threshold (MLSD)
The higher the value, the less detail the lines have, and vice versa.
Hough distance threshold (MLSD)
The higher the value, the less the object remains at the arc angle.
It can be understood that the smaller the value, the more the curve is preserved, and the larger the value, the less the curve is preserved.
usage analysis
We use the default parameters of MLSD to extract lines according to the original image. After extraction, we can see that the extracted lines are only horizontal, vertical, and oblique lines are visible, and the character part is completely black. Then we clear the prompt, Prompt Only input “Room”
, and then generate a picture according to the MLSD line draft, the final effect is the picture on the right.

original image

MLSD

reboot
This is not the correct way to use the model algorithm, we generate a building this time, and then try again with the MLSD algorithm using the building image.
roomNegative prompt: nsfwSteps: 28, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 939293653, Size: 512x1024, Model hash: 7234b76e42, Model: chilloutmix_Ni
Our reminder is that there is only one room. The effect after generation is shown in the first picture below, and then extract the line draft through MLSD. This time, we can see that the extraction effect is very good, and then use ControlNet to reboot, the effect is very good.

original image

MLSD

reboot
Normal_map
Algorithm Introduction
In the 3D film and television industry, normal is actually one of the most important textures of a model. A model requires several textures to be synthesized and then rendered to achieve a good effect. In the production of model materials, the normal map is called the normal map, which is generally used to save the depth information on the model. For example, the small bumpy texture displayed by the model is not sculpted by the model, and it may also use a normal map. The normal map will control the shadow depth of the model surface, so as to achieve the feeling of unevenness.
Parameter explanation

Normal Resolution
Normal resolution: The higher the resolution, the better the normal calculation effect, but it is easy to burst the video memory.
Normal background threshold
Background Normal Threshold: The smaller the parameter, the more the background is preserved; the larger the parameter, the more the background is removed.
Threshold B
Threshold B: Cannot be adjusted
usage analysis
Select Normal for preprocessing and model -> Generate Normals -> Reboot.

original image

Normal_map

reboot
It can be seen that the angle of light and shadow is reserved.
Show results
We tried to change the character, but the light and shadow angles were reserved [the webmaster knows that gentlemen don’t like to see it, so they manually mosaic it].



OpenPose
Algorithm introduction
Extracting the character's skeletal posture should be something everyone is more concerned about. Finally, we can get rid of the prompt words to control the character's posture.
Parameter explanation

Annotator Resolution
Resolution: The resolution of the generated skeleton image, generally keep the default.
Threshold A
Threshold A: Cannot be adjusted.
Threshold B
Threshold B: cannot be adjusted.
Show results

original image

skeleton
Obviously, this is not the result we want. In fact, the main problem is that the body display of the characters in this picture is not complete, so the recognition effect is extremely poor, and such an effect cannot be used at all. At this point, we should use another plug-in to help us solve this problem, that is OpenPose Editor
.
OpenPose Editor
If you plan to use the OpenPose algorithm frequently, then we'd better install a SD plug-in OpenPose Editor
, which can reload the UI after SD WebUI -> 扩展-> 可用-> OpenPose Editor
installed, and then it will be in the menu bar See OpenPose Editor
plug-in, which is very convenient for us to customize skeletal poses.

The UI interface of the plugin is as follows (translated):

Instructions
If the effect of using openpose to extract bones is not good, then we will give up using openpose preprocessing.
We copy the character picture that has been generated on文生图
interface, then switch to OpenPose Editor
interface, press Ctrl+V on the keyboard, and paste the picture we just copied, and the bone position will be automatically matched at this time. If there are only pictures after pasting and no bones are generated, please click the Add
button to add a set of bones.
default match
The following default matching picture is automatically matched and calculated after we paste the picture here, but the effect is poor, because the body of the character in the picture is not displayed completely, and the effect will be very good if the whole body is included in the mirror.
manual adjustment
Since the default matching effect is extremely poor, then we manually adjust the bone position. We can press and hold the joint point of the bone with the left mouse button and drag it to the desired position to let go, so that the bone position matching problem can be realized. The second picture below Zhang is the style after adjustment on this site. Based on the original posture, we put the raised arm on the chest.
tips
Skeletal disorder? Can't find the corresponding position of each bone? Please look at the third picture. The color of the joints of each bone is different. Please compare which one is the left and right arms and which one is the legs. After using it a few times, you will be able to subconsciously know how to adjust it.

Default match

Manual adjustment

Default skeleton
Reboot
After adjusting the bones in OpenPose Editor
interface, select >>文生图
button in this interface, and then it will automatically jump to the Vinsen diagram interface, and automatically load the adjusted skeleton pose.
The use of this algorithm is different from the previous model, because we do not need to use the openpose model in Preprocessor.
In the ControlNet interface, Preprocessor设置为none
, but set the model option to control_sd15_openpose
.

The settings are as shown in the figure
After the settings are completed, remember to enable the ControlNet function, and then click the生成
button. Let's take a look at the effect.
Show results
Very nice, the poses are completely consistent with the bones we set.



Pidinet
Algorithm introduction
It is also edge detection, but the effect is very different from Canny and Hed. This edge detection method extracts edges in places with large color differences.
Parameter explanation
This function parameter is consistent with other algorithms and will not be repeated.
Instructions
The model corresponding to the Pidinet algorithm is somewhat vague. This site has not found a corresponding model, and there is no corresponding model, or it can be said that any model is corresponding, because no matter which model is used, the picture can be produced, but the effect is similar. You can try using Pidinet Preprocessing to correspond to Canny or Hed models.
The following is a display of the edge line effect extracted by this algorithm. The usage method is consistent with other algorithms. It can be seen that the extracted edge lines only retain the parts with large color differences, but not much details are retained.

original image

Pidinet
Show results
The following pictures are pictures generated using this algorithm.

Reboot

Reboot
Scribble
Algorithm Introduction
Artistic graffiti: Regarding Scribble’s graffiti algorithm, it should actually extract areas with obvious exposure contrast and use this to redirect it.
Parameter explanation
This function parameter is consistent with other algorithms and will not be repeated.
Instructions
The following is a display of graffiti effects extracted by this algorithm. The usage method is consistent with other algorithms. You can see the extracted graffiti, only the parts with high exposure contrast are preserved, but the details are well preserved, and the more details are preserved, the smaller the parts that can be changed when rebooting.

original image

Scribble

reboot
Show results
I tested it by replacing other LoRA character models. There are some bugs, but the effect is acceptable, barely passing.

Feeling like three hands

braids are identified as shoulder straps

pass
manual graffiti
Next, let's draw something casually, try the effect of manual graffiti, delete all the positive and negative prompts, and only keep flower
(flower) prompt to see how it works.

Hand Painted

Scribble guide

Scribble guide
Fake_Scribble
Algorithm Introduction
Hand-drawn graffiti, keeping only the main lines and no internal details.
Parameter explanation
This function parameter is consistent with other algorithms and will not be repeated.
Instructions
Note: In this Tuya algorithm, Fake_Scribble
is selected in Preprocessor
, but the model is consistent with the Scribble algorithm, and control_sd15_scribble
is also selected, and the two algorithm models are shared.
The following is a display of the graffiti extracted by the Fake_Scribble
algorithm. You can see the extracted graffiti, only the main part is kept without too many details. The following is the result of the reboot.
Show results

original image

Fake_Scribble

reboot
manual graffiti
We also use the flower drawn in Scribble as an example, delete all positive and negative prompts, and only retain the prompt word " flower
" to see how the effect is and compare the effects of the two graffiti.

Hand Painted

Fake_Scribble

Fake_Scribble guide
As you can see, the same hand-drawn drawing, but after being processed by the Fake_Scribble algorithm, the solid lines of the graffiti become hollow lines. This is the biggest difference between the two algorithms, Fake_Scribble and Scribble.
Segmentation
Algorithm Introduction
Semantic segmentation: perform block segmentation on multiple objects in the image, such as buildings, sky, flowers, trees, etc.
Parameter explanation
This function parameter is consistent with other algorithms and will not be repeated.
Instructions
Although this model may not support characters very well, after testing, it actually works extremely well. In order to better perform semantic segmentation, this site uses pictures with outdoor scenes as the original images. After segmentation through the Segmentation algorithm, we can see that the segmentation effect is very good, and then redirected, it also has good results. Effect.

original image

Segmentation

reboot
Show results

original image

Segmentation

reboot
Summarize
At this point, the introduction to all models and algorithms ControlNet
is over. It can be seen that ControlNet
's capabilities are still very powerful, especially the graffiti and Pose mapping functions, which can omit a lot of prompt words.