HomeAI News
Elephant P turns out of the box! Hong Kong University, Nanjing University, Tsinghua University, etc. are the first to open source the "replica" version of DragGAN
372

Elephant P turns out of the box! Hong Kong University, Nanjing University, Tsinghua University, etc. are the first to open source the "replica" version of DragGAN

Hayo News
Hayo News
May 26th, 2023
View OriginalTranslated by Google
DragGAN unofficial implementation is here! Perfectly reproduce the function of dragging and dropping the second P picture, you can try it directly.

Remember the DragGAN released a few days ago?

That's right, it's the "twice-tap" tool for 1-second photo editing.

Did you take a photo with a bad expression? build! Face shape not thin enough? build! Is the angle of the face facing the camera wrong? build!

Maybe, the ancient PS joke "Let the elephant turn around" may come true.

Once the demo video of this AI retouching tool was released, it became a huge hit both at home and abroad.

Many netizens called out, "PS doesn't exist anymore."

In just a few days, the unofficial implementation of DragGAN can be tried out. This function has been integrated into InternGPT, the interface looks like this↓

Experience address: https://igpt.opengvlab.com/

Unexpectedly, as soon as the demonstration entrance was opened, it was immediately overwhelmed.

official demo

Judging from the official demo video, the reproduced DragGAN effect is absolutely amazing.

grin

First of all, how to laugh at a person who is not smiling. Just select the two corners of the mouth and drag them directly.

It can be seen that the final result has no sense of violation. Because the facial muscles also change together, not just a grin.

close your mouth

face editing

Everyone is very familiar with this face-lifting function. Select two faces and squeeze them in, and the output is still very natural.

Thin face for men. But this one is a bit too thin, the output result is fake at first glance, and the chin is too sharp.

This must be pushed! Hair! Good news for many bald people.

However, judging from the output results, even if the forehead is selected, the hair in all places will grow in the same proportion. The final result is a bit like the Monkey King.

turn around

Face rotation is also a very practical function, and the completed part is very natural.

other functions

In addition to small-scale retouching, InternGPT itself has many other eye-catching operations that can be performed.

Remove covered objects

Click the part you want to operate in the picture, and enter "remove" in the prompt.

image generation

This function is more interesting. First upload a picture, enter a prompt to let DragGAN segment, and then enter a prompt to generate the desired picture.

Showing black feet? (no)

Video highlight commentary

With prompt, you can also edit videos with one click.

Interactive visual quiz

Even after identifying the information on the picture, it can be directly queried online.

interactive image generation

Graffiti at hand can be turned into a beautiful picture with one click.

Anyway, after reading these functions, the editor was really shocked. All functions highlight two features: "fool-like operation, and extremely easy to use".

Who can not love this?

Technical realization

After seeing so many cool features, what exactly is this InternGPT?

InternGPT (iGPT for short)/InternChat (iChat for short) is a visual interaction system driven by pointing language. Users can interact with ChatGPT by clicking, dragging and drawing.

Unlike existing interactive systems that rely on pure language, by integrating pointing instructions, iGPT significantly improves the communication efficiency between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complex visual scenes Even more so.

Paper address: https://arxiv.org/pdf/2305.05662.pdf

The figure below is the overall architecture of InternGPT.

We can see that this GPT can process not only images, videos, but also voice and text.

For image or video input, InternGPT will process it with SAM (Image Segmentation Model), OCR (Image Recognition Model), etc.

After identifying geographic locations, objects, or lines, there is a whole toolbox for further processing, all of which are familiar tools.

Such as BLIP (audio), Stable Diffusion (image), Pix2Pix (image translation) and so on.

Similarly, for text or speech input, InternGPT will call GPT-4, LLaMA and other models or tools for processing, and there will also be a whole toolbox later.

Overall Architecture of InternGPT

Use suggestions

In the process of using, the whole process is also very convenient.

After the image is successfully uploaded, the user can send the following message to have a multimodal dialogue with iGPT:

plain text ANTLR4 Bash C C# css CoffeeScript CMake Dart Django Docker EJS Erlang Git Go GraphQL Groovy HTML Java JavaScript JSON JSX Kotlin LaTeX less Lua Makefile markdown MATLAB Markup Objective-C Perl PHP PowerShell .properties Protocol Buffers Python R Ruby Sass (Sass) Sass (Scss) Scheme SQL Shell Swift SVG TSX TypeScript WebAssembly YAML XML "what is it in the image?" or "what is the background color of the image?".

Similarly, users can also interactively manipulate, edit or generate pictures, as follows:

· Click anywhere on the image and press the Pick button to preview the segmented area. You can also press the OCR button to recognize all words present at a specific location;

· To remove masked regions in an image, you can send a message like this:

plain text ANTLR4 Bash C C# css CoffeeScript CMake Dart Django Docker EJS Erlang Git Go GraphQL Groovy HTML Java JavaScript JSON JSX Kotlin LaTeX less Lua Makefile markdown MATLAB Markup Objective-C Perl PHP PowerShell .properties Protocol Buffers Python R Ruby Sass (Sass) Sass (Scss) Scheme SQL Shell Swift SVG TSX TypeScript WebAssembly YAML XML “remove the masked region”

· To replace the masked object in the image with another object, you can send the following message:

plain text ANTLR4 Bash C C# css CoffeeScript CMake Dart Django Docker EJS Erlang Git Go GraphQL Groovy HTML Java JavaScript JSON JSX Kotlin LaTeX less Lua Makefile markdown MATLAB Markup Objective-C Perl PHP PowerShell .properties Protocol Buffers Python R Ruby Sass (Sass) Sass (Scss) Scheme SQL Shell Swift SVG TSX TypeScript WebAssembly YAML XML “replace the masked region with {your prompt}”

· To generate a new image, send the following message:

plain text ANTLR4 Bash C C# css CoffeeScript CMake Dart Django Docker EJS Erlang Git Go GraphQL Groovy HTML Java JavaScript JSON JSX Kotlin LaTeX less Lua Makefile markdown MATLAB Markup Objective-C Perl PHP PowerShell .properties Protocol Buffers Python R Ruby Sass (Sass) Sass (Scss) Scheme SQL Shell Swift SVG TSX TypeScript WebAssembly YAML XML “generate a new image based on its segmentation describing {your prompt}”

· To create a new image by scribbling, press Whiteboard and draw on the whiteboard. After the drawing is complete, you need to press the save button and send the following message:

plain text ANTLR4 Bash C C# css CoffeeScript CMake Dart Django Docker EJS Erlang Git Go GraphQL Groovy HTML Java JavaScript JSON JSX Kotlin LaTeX less Lua Makefile markdown MATLAB Markup Objective-C Perl PHP PowerShell .properties Protocol Buffers Python R Ruby Sass (Sass) Sass (Scss) Scheme SQL Shell Swift SVG TSX TypeScript WebAssembly YAML XML “generate a new image based on this scribble describing {your prompt}”

Reviews

That shocking DragGAN now has an unofficial version. The official release will be in June, this is just a preview of the future.

DragGAN has been integrated into InternGPT, and it will come out so soon, an artifact of retouching.

References:

https://igpt.opengvlab.com/

Reprinted from 新智元View Original

Comments

no dataCoffee time! Feel free to comment