HomeAI News
When "splitting everything" meets image patching: no need for fine marking, click on the object to achieve object removal, content filling, and scene replacement
16

When "splitting everything" meets image patching: no need for fine marking, click on the object to achieve object removal, content filling, and scene replacement

Hayo News
Hayo News
April 18th, 2023
View OriginalTranslated by Google
This time, the powerful "segment everything" model - Segment Anything Model, collided with sparks in the image patching task.

In early April, Meta released the first image segmentation basic model in history - SAM (Segment Anything Model) [1]. As a segmentation model, SAM has powerful capabilities and is also very friendly in operation and use. For example, the user simply clicks to select the corresponding object, and the object will be segmented immediately, and the segmentation result is very accurate. As of April 15, the number of Stars in SAM's GitHub repository is as high as 26k.

How to make good use of such a powerful "split everything" model and expand it to more practical application scenarios is crucial. For example, when SAM meets the practical image inpainting (Image Inpainting) task, what kind of sparks will it collide with?

The research team from the University of Science and Technology of China and the Advanced Institute of Oriental Technology gave an amazing answer. Based on SAM, they proposed the "Inpaint Anything" (IA for short) model. Different from the traditional image inpainting model, the IA model does not need fine-grained operations to generate masks, and supports one-click marking of selected objects. IA can remove everything (Remove Anything), fill everything (Fill Anything), and replace everything Scenario (Replace Anything) , covering a variety of typical image inpainting application scenarios including target removal, target filling, background replacement, etc.

method introduction

Despite significant progress in current image inpainting systems, they still face difficulties in selecting mask maps and filling holes. Based on SAM, researchers tried for the first time mask-free image restoration , and constructed a new paradigm of "clicking and filling" (Clicking and Filling) image repair, which they called Inpaint Anything (IA ). The core idea behind IA is to combine the advantages of different models to build a powerful and user-friendly image inpainting system .

IA has three main functions: (i) Remove Everything (Remove Anything): users only need to click on the object they want to remove, and IA will remove the object without trace, achieving efficient "magic elimination"; (ii) Fill Anything: At the same time, the user can further tell IA what to fill in the object through the text prompt (Text Prompt), and the IA will then drive the embedded AIGC (AI-Generated Content) model (such as Stable Diffusion [ 2]) Generate corresponding content-filled objects to realize "content creation" as you like; (iii) Replace Everything (Replace Anything): Users can also click to select the object to be kept, and use text prompts to tell IA that they want to change the background of the object What to replace, you can replace the background of the object with the specified content to achieve a vivid "environment conversion". The overall framework of IA is shown in the figure below:

Inpaint Anything (IA) schematic diagram. Users can select any object in the image by clicking on it. With the help of powerful vision models, such as SAM [1], LaMa [3] and Stable Diffusion (SD) [3], IA can smoothly remove selected objects (ie Remove Anything). Further, by inputting a text prompt to the IA, the user can fill the object with any desired content (ie Fill Anything) or replace the object of the object arbitrarily (ie Replace Anything).

remove everything

Remove Everything (Remove Anything) schematic

The "Remove everything" steps are as follows:

  • Step 1: The user clicks on the object to be removed;
  • Step 2: SAM segments the object;
  • Step 3: The image inpainting model (LaMa) fills in the object.

fill everything

Fill Anything (Fill Anything) diagram, the text prompt used in the diagram: a teddy bear on a bench

The "fill everything" steps are as follows:

  • Step 1: The user clicks on the object to be removed;
  • Step 2: SAM segments the object;
  • Step 3: The user indicates the content to be filled through the text;
  • Step 4: The image inpainting model based on text hints (Stable Diffusion) fills the object according to the text provided by the user.

replace everything

Replace Everything (Replace Anything) diagram, the text prompt used in the diagram: a man in office

The "fill everything" steps are as follows:

  • Step 1: The user clicks on the object to be removed;
  • Step 2: SAM segments the object;
  • Step 3: The user indicates the background to be replaced by text;
  • Step 4: The image inpainting model based on text hints (Stable Diffusion) replaces the background of the object according to the text provided by the user.

Model results

The researchers then tested Inpaint Anything on the COCO dataset [4], the LaMa test dataset [3], and their own 2K HD images taken with their mobile phones. It is worth noting that the researchers' model also supports 2K high-definition images and arbitrary aspect ratios, which enables the IA system to achieve efficient migration applications in various integration environments and existing frameworks .

Remove all experiment results

Fill in all experimental results

Text prompt: a camera lens in the hand

Text Prompt: an aircraft carrier on the sea

Text Prompt: a sports car on a road

Text Prompt: a Picasso painting on the wall

replace all test results

Text Prompt: sit on the swing

Text prompt: breakfast

Text prompt: a bus, on the center of a country road, summer

Text Prompt: crossroad in the city

Summarize

The researchers set up such an interesting project to demonstrate the powerful capabilities that can be obtained by fully utilizing existing large-scale artificial intelligence models, and to reveal the unlimited potential of "combinable artificial intelligence" (Composable AI). The Inpaint Anything (IA) proposed by the project is a multifunctional image inpainting system that combines functions such as object removal, content filling, and scene replacement (more functions are on the way, so stay tuned).

IA combines SAM, image inpainting models (such as LaMa) and AIGC models (such as Stable Diffusion) and other visual basic models to realize user-friendly unmasked image inpainting, and supports "click to delete, prompt to fill" etc. "Fool-style" humanized operation. In addition, IA can also process images with any aspect ratio and 2K HD resolution, regardless of the original content of the image.

Currently, the project is fully open source . Finally, welcome to share and promote Inpaint Anything (IA), and look forward to seeing more new projects based on IA. In the future, researchers will further tap the potential of Inpaint Anything (IA) to support more practical new functions, such as fine-grained image matting, editing, etc., and apply it to more real-world applications.

references

[1] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.

[2] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern

Recognition, pages 10684–10695, 2022.

[3] Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/ CVF winter conference on applications of computer vision, pages 2149–2159, 2022.

[4] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference,

Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.

Reprinted from 机器之心View Original

Comments

no dataCoffee time! Feel free to comment