GPT-4 killer Google Gemini strikes! The list of 26 R&D leaders has been revealed, offering Midjourney-like image-generating capabilities
According to The Information, Google has assembled a team of hundreds of engineers. The new killer Gemini combines the capabilities of the three major models of GPT-4, Midjourney, and Stable Diffusion, and it will be launched this fall.
Google's new big killer Gemini is about to meet the world!
It is rumored that Gemini can not only conduct text conversations like GPT-4, but also integrates the capabilities of Midjourney and Stable Diffusion to generate images.
In order to fight against OpenAI, Google CEO Pichai took an extraordinary step in April this year, merging teams with completely different cultures and codes-Google Brain and DeepMind.
Now, the Google Avengers, which has assembled hundreds of engineers, is on standby, working day and night, just to snipe OpenAI's GPT-4 and regain the top spot in the AI field in one fell swoop.
Google founder Sergey Brin has also returned to the trenches to personally train Gemini.
Gemini is said to be out this fall, and Google's test is coming.
Avengers roster has been revealed
Bet on Gemini to create the strongest GPT-4 killer
According to people familiar with the matter, Gemini combines the textual capabilities of LLM with the capabilities of Vincent diagrams.
In other words, it is equivalent to a combined version of GPT-4 and Midjourney/Stable Diffusion.
This is also the first time the outside world has heard that Gemini has such a powerful drawing ability.
In addition, it provides the ability to analyze charts, create graphs with text descriptions, and control software using text or voice commands.
At the end of June, Google DeepMind CEO Hassabis also broke the news that Gemini will combine AlphaGo and large language models, and Google DeepMind is ready to invest tens of millions of dollars, or even hundreds of millions.
Gemini will integrate AlphaGO, which uses reinforcement learning and tree search, as well as technologies in robotics, neuroscience and other fields.
It can be said that Google is betting heavily on Gemini, which will power the Bard chatbot and promote enterprise applications such as Google Docs and Slides.
In addition, Google also hopes to charge developers to access Gemini through the cloud server rental service.
Currently, Google Cloud sells access to Google AI models through the Vertex AI product
If these new features come to fruition, there's a good chance Google will catch up to Microsoft.
After all, Microsoft is already far ahead in AI products, with Office 365 apps including AI capabilities, and its apps selling users access to ChatGPT.
James Cham, an investor in AI startups at Bloomberg Beta, the venture capital arm of Bloomberg, told Bloomberg, "For the past nine months, everyone has been asking this question: When will there be a company that looks like it has the potential to catch up with OpenAI? "
"Now, finally, there seems to be a model that is comparable to GPT-4."
Google, forced out of comfort zone
With the rise of OpenAI, Google has to try to introduce new technologies while ensuring its core search business.
According to insiders, before launching Gemini, Google is likely to use it in some products.
In the past, Google has used simpler models to improve search, but products like Bard and Gemini need to analyze large amounts of images and text to generate more human-like answers.
The potentially huge server costs brought about by such a large amount of data are also something Google must control.
The updated Bard is stronger
Take Advantage of YouTube
According to The Information, Google trained Gemini on a large number of YouTube videos.
Moreover, Gemini can also integrate audio and video into the model itself to form multimodal capabilities, and the latter has been considered by many researchers to be the next frontier of AI.
For example, a model trained on YouTube videos can help mechanics diagnose car repair problems based on videos.
Or software code can be generated from a sketch of a website or application a user wants to create. OpenAI has previously demonstrated this capability for GPT-4, but it is not yet available.
OpenAI boss Greg Brockman has demonstrated the ability of GPT-4 to read pictures and write web page code, but it seems to be a pigeon
Using YouTube content can also help Google develop more advanced text-to-video software that automatically generates detailed videos based on the content descriptions users want to watch.
It's similar to technology being developed by Google-backed startup RunwayML, which is now being closely watched by content creators in Hollywood.
Google DeepMind launches a full-fledged counterattack
In 2011, Google created Google Brain (Google Brain), which aims to build Google's own AI to optimize search results, precise advertising, and autofill in Gmail.
London-based DeepMind, on the other hand, is more devoted to academic research — AlphaGo's 4-1 victory over Lee Sedol in 2016 is seen as an important milestone on the road to artificial general intelligence (AGI).
Aside from the fact that Google will use the software developed by DeepMind to improve the operating efficiency of its data centers, DeepMind's work has not had much impact on its core products.
But at the end of last year, everything changed.
In November 2022, OpenAI released ChatGPT, and the number of users soared to tens of millions in just a few weeks, and then achieved the achievement of breaking 100 million users in the shortest time.
Within a few months, OpenAI's revenue reached hundreds of millions of dollars, and during this period, Microsoft invested 10 billion dollars in new investment, and countless capital hot money flowed to OpenA. The market value and popularity of OpenAI reached an unprecedented height.
At this time, Google realized that its leadership in the field of AI was already in jeopardy.
Google Brain + DeepMind = ?
In April of this year, the passive Google released the ultimate move: Google Brain and DeepMind officially merged!
The two major divisions of "The King Does Not See the King" actually fit together, and this move also shocked the people's jaws.
The combined Google DeepMind will be led by DeepMind CEO Demis Hassabis, with former Google AI chief Jeff Dean taking over as chief scientist.
Now, at least 26 bigwigs are working on Gemini's development, including researchers who have worked at Google Brain and DeepMind.
Two DeepMind executives, Oriol Vinyals and Koray Kavukcuoglu, will lead the development of Gemini, along with former Google Brain chief Jeff Dean, people familiar with the matter said. They will oversee the hundreds of employees involved in Gemini's development.
In addition, Google's co-founder Sergey Brin is also a veteran, returning after a long absence.
Sergey Brin and Larry Page
He has been evaluating Gemini models and helping staff train the models.
According to reports, Brin was also involved in the technical decision-making process for retraining the model after the team discovered that Gemini had accidentally been trained on potentially offensive content.
The throes of an "accidental marriage"
With the merger of Google Brain and DeepMind, the new team quickly encountered very serious problems-how to merge the code, and whose software is used for development?
After all, the code bases of these two departments were completely independent before.
Although the two sides reached a compromise after each concession:
- In the pre-training phase of the model, use Google Brain's software Pax for training machine learning models
- At a later stage, use DeepMind's software Core Model Strike for developing the model
But according to insiders, there are still many employees who are angry because they have to use software they are not familiar with.
In addition, both Google and DeepMind have developed their own models for ChatGPT.
DeepMind embarked on a project code-named Goodall to develop a system to compete with ChatGPT using different variants of the unpublished model Chipmunk. Google Brain developed Gemini.
In the end, DeepMind decided to abandon its original efforts and chose to cooperate with the Google Brain-based project to develop Gemini.
Interestingly, Google Brain is said to be much more relaxed than DeepMind in terms of remote work policies.
Internal friction, embarrassment, counterattack
Compared with the situation on the OpenAI side, Google is caught in an exhausting internal friction.
First, a series of senior technical talents left, such as Liam Fedus, Barret Zoph and Luke Metz, researchers, etc., chose to join OpenAI.
Although Google has recovered some talents: such as re-recruiting Jacob Devlin and Jack Rae.
Jacob Devlin went to OpenAI in January of this year after criticizing Bard's development. And Jack Rae is a former DeepMind researcher who joined OpenAI in 2022.
Previously, Devlin expressed his concerns about the use of ChatGPT data training by the Bard team to Pichai, Dean and other executives, and then resigned
Then, in order to fight against the thriving ChatGPT and to return to the leader of the artificial intelligence track, Google hastily released the chat robot Bard in February this year.
However, the press conference was overturned due to a low-level factual error, causing the company's market value to evaporate hundreds of billions of dollars overnight.
Google's first counterattack ended in embarrassment.
By May, the new PaLM 2 model was released at Google I/O, greatly improving Bard's ability to answer questions and generate code.
Also released at the same time is the Search Generative Experience (SGE), which combines generative AI with its own traditional search services.
To put it simply, SGE is an AI search service similar to Bing Chat, but instead of using the new chat window directly, it displays the AI-generated content collection to users in the search results.
That is to say, while searching, Google will use AI to provide explanations for the searched content, answer questions raised by users, help users make travel planning, and so on.
And users no longer need to jump back and forth between multiple links like shopping around, and don't have to spend effort to judge which information behind the link is true, because all available content is concentrated in the replies collected by AI. middle.
In a recent update, Google has added the ability for SGE to attach pictures and videos to the reply content generated by AI, helping users to understand the knowledge and information they search for more intuitively.
Just like Bing Chat, SGE's AI responses will include time-published links to support the content of the AI-generated responses. If users are interested in relevant information, they can click on the link to understand the specific content more comprehensively.
In the replies generated by AI, for a lot of knowledge information and concepts, users can get accurate definitions of the concepts directly by hovering the mouse.
This function is now available for AI responses to knowledge-based questions such as science, history, and economics.
For users who need to browse very long webpage information to learn or understand information, SGE has also updated an AI summary function in a webpage - SGE while browsing.
This function is equivalent to providing users with an "outline generator" that is ready to be dispatched at any time. For any long web content, users can use it to generate an outline and quickly grasp the main points.
In the Explore the Page section below, users can also see questions related to the content of the page. If the user is interested in the question, click directly to see how the content of the article answers these questions.
However, due to Google's conservative market strategy, SEG currently only allows users in the United States to use the Waiting List to apply for testing.
So probably most users don't even know that Google has launched such a service.
In short, it is reported that after the combination of the two departments, at least 21 generative AI tools have been tested, and even tools that provide users with life advice and psychological counseling.
Last year, Google, which urgently fired engineers who claimed to have a conscious chat AI, has now begun to explore such "sensitive" areas. It can be seen that it really decided to give it a go.
The Gemini project, the current situation is very good
However, the merger of the two teams is indeed a big surprise for some engineers who are working on the Gemini project.
James Molloy and Tom Hennigan, who previously worked at DeepMind, were in charge of the infrastructure, along with Google senior researcher Paul Barham.
Timothy Lillicrap worked at DeepMind on developing systems for chess and Go, while Emily Pitler, a researcher at Google Brain, leads a team focused on making LLMs capable of specialized tasks like math or web searches.
But in addition to staffing issues in the merged organization, the Gemini team also faced huge challenges during the development process, such as identifying data that could be used for model training.
So Google's lawyers have been closely evaluating the training effort.
In one case, lawyers ordered researchers to remove training data from textbooks, fearing objections from copyright holders.
And that data could have helped train models to answer questions about fields like astronomy or biology.
However, Aydin Senkut, a former Google executive and founder of VC firm Felicis Ventures, commented on the release of Gemini that he saw that "Google is determined to be at the forefront again, rather than being extremely conservative."
Aydin Senkut also agrees with Google's decision:
"It's the right direction. Eventually, they'll catch fire."