AI points of the week: Google launches a new large model, and then challenges GPT-4
In this week, Google "opened up" and threw out the "blockbuster" PaLM 2, a large model. It demonstrated a number of capability comparisons at the Google I/O 2023 conference, and once again set the "GPT-4" on GPT-4. Gauntlet".
Google launches PaLM 2 large model
The live demonstration has some capabilities beyond GPT-4
On May 11th (Thursday), the Google I/O 2023 conference is coming as scheduled. In addition to releasing its first folding screen mobile phone Pixel Fold equipped with a self-developed processor, and the new Android 14 system, the highlight of this conference is more is in the field of AI.
Among them, Google grandly introduced a new generation of large language model PaLM 2. Compared with the previous generation, it has greatly improved its language processing, reasoning and coding capabilities. According to Google's test, some of PaLM 2's performance (such as mathematics) Will be even better than GPT-4.
In addition, the multi-modal feature of "understanding and generating audio and video content", up to the "cloud" and down to the multi-scale version running on Android native machines, are also its killer features. Google is also very aware of its existing advantages. Letting Android plug in the "wings" of PaLM 2 will greatly help its AI popularization strategy.
With the release of PaLM 2, the Bard that migrated to the PaLM 2 model has also been fully upgraded, which is different from the past. It not only adds a picture question and answer function, but also cancels the waiting list, announcing that it is fully open for use.
SD developers launch Stable Animation SDK
The era of "AI animation" is coming
Stable Diffusion brings people into the era of "AI painting". On May 12th (Friday), another "AI creation" revolution is coming again, SD developer Stability AI released the Stable Animation SDK, a tool designed for artists and developers to achieve the most advanced Stable Diffusion model for stunning animations.
According to the official website, Stable Animation users can currently create animations in 3 different ways through the SDK to generate animations with text, text input + initial image input, and video input + text input.
However, at present, we cannot deploy and run it purely locally, but need to use the computing power of Stability AI and pay a certain amount of money to use it. Fortunately, some media calculated according to the official formula, and 1 RMB can generate a section of 100 frames The price of the video is still "affordable" if it is only for early adopters.
HuggingFace launched the official version "HuggingGPT"
Transformers Agents API is here
We have previously introduced HuggingGPT, an "AI scheduling platform" launched by Zhejiang University x Microsoft. Users only need to use natural language to provide requirements, and HuggingGPT can use ChatGPT to connect various AI models in the HuggingFace community to complete multi-modal complex tasks.
And now HuggingFace officially launched the official version of "HuggingGPT" - Transformers Agents API. Through this API, you can control more than 100,000 Hugging Face models to complete various multimodal tasks.
For a simple example, if you want Transformers Agents to tell you what is drawn on the picture with voice, all you have to do is to tell the instruction, it will automatically turn the instruction into a prompt and call the model and tools, the same purpose as HuggingGPT - try to Serve as a competent "dispatch desk".
Although this is still too early for the concept of "Everything APP", it is a step in this direction.
"AI Stefanie Sun" is popular all over the Internet
Let's understand the AI model behind it
Recently, "AI Stefanie Sun" has become popular on Bilibili, Weibo and other social networks, and people have discovered that they can make "Sun Stefanie" sing various songs that she has never sung.
And behind these singing voices, they all come from an AI model called So-VITS-SVC 4.0. Unlike various language models with high requirements, So-VITS-SVC 4.0 is relatively friendly to hardware requirements. As long as it has a graphics card with more than 6G memory, it can train and reason the desired vocal model by itself.
To put it simply, we only need to prepare a long enough, clean and clear training soundtrack (similar to the mat image of our AI drawing img2img), and do a good job of format processing, and the rest can be handed over to the graphics card, time and lazy one-click training.
If you already have a well-trained vocal model, just import the dry sound of the song you want him to sing, and you can get the version of "Stefanie Sun Cover" and "Jay Chou Cover" in a minute or two through the lazy package. Disc.
Of course, the developers of So-VITS-SVC 4.0 also clearly emphasized to everyone that the application of AI models must abide by laws and regulations and not infringe on the rights of others. Only under certain rules can this interesting technology work better. develop.
This week's AI application recommendation
Akuma: "A website for quickly creating "talking" and "moving" paper people
Copilot Hub: Turn long videos into short videos that are easy to spread in 1 second
Skybox Lab: Generate 360° VR scenes with prompts
News Minimalist: Read 1000 news articles with GPT-4 and select the most important news
ChatGPT Adventure: A text adventure game based on ChatGPT
Call Annie: Make video calls with ChatGPT anytime, anywhere
LLaVA: visual dialogue model, which can reach the level of GPT4 85%
The popularity of "AI Stefanie Sun" has told technology practitioners: It turns out that AI can be so "down-to-earth" and be accepted by people at once. While AI is constantly seeking innovation and development, it should also better integrate into daily life and serve the public. After all, it is human beings who develop AI, and we should always remember to put people first.
Because it's there.