Apple spends millions every day and bets 200 billion on Apple GPT parameters! Crazy poaching from Google to build a nuclear bomb-grade iPhone
Apple executives who once doubted what LLM could do are now in a hurry. Apple burns millions of dollars a day just to stuff Apple GPT into the iPhone released next year.
Apple in a hurry?
According to The Information, in order to accelerate the development of LLM, Apple is now not only greatly increasing research funding - burning millions of dollars a day, but also poaching many engineers from Google.
In this regard, Apple employees agree that their Apple GPT large model capabilities have exceeded GPT-3.5.
And Siri is also on the rise-just tell it, "create a GIF with the last 5 photos taken and send it to my friend", and it will automatically perform this series of operations, and we don't even need to click our fingers .
As for the main force of the large-scale model team, it has been picked up by foreign media - almost all key roles come from Google.
Pay attention to this picture, we will take the test below
Apple will not be absent in the battle of generative AI giants!
The person in charge of AI is unwilling to give up: one step too slow
Apple had a chance to become OpenAI.
Four years ago, Apple's AI chief John Giannandrea formed a team to develop conversational AI, or big language models.
This move is of course very prescient, but it is still a step late-last fall, ChatGPT, which was first released by OpenAI, has already attracted the attention of the world.
Several Apple internal staff said that Apple is not unprepared for the prosperity of large language models, but Giannandrea has repeatedly doubted before: AI model-driven chatbots can be useful.
Now, Apple has obviously regretted it-even if it costs a lot, it has to make a large model!
16-person main force, multiple teams sprinting for LLM together
What is the price?
Sam Altman once said that OpenAI spent several months training the strongest GPT-4 on the surface and burned more than 100 million US dollars.
In contrast, Apple's Foundational Models team has only about 16 people, but the budget for training models has grown to millions of dollars per day.
The team is made up of several ex-Google engineers hired by Apple (Giannandrea worked for them when they were still at Google), and is led by Ruoming Pang, who chose to join Apple in 2021 after 15 years at Google.
According to people familiar with the matter, the team plays a role similar to the AI labs at Google and Meta - researchers are responsible for developing AI models, and other departments are responsible for applying models to products.
In addition, according to a recent research paper and employee profiles on LinkedIn, Apple has at least two other teams that are also developing language or image models.
One of the vision teams works on developing applications that generate "images, videos or 3D scenes".
Another group is working on long-term research in multimodal AI — having models recognize and generate images, videos, and text at the same time.
Now, Apple has developed several models and is intensively conducting internal testing.
Siri is about to get a big upgrade
In the view of the Apple team, the current most advanced model, Ajax GPT (or Apple GPT), has surpassed GPT-3.5.
We have previously reported that Apple is secretly developing "Apple GPT" to compete with OpenAI and Google.
With such a powerful language model blessing, Apple's series of products will of course undergo a wave of major upgrades.
For example, with the next command, Siri will automatically create an animation and send it to someone on the phone.
In addition, Apple will also launch an app called Shortcuts, which allows users to manually program and string together the functions of different apps.
It is expected that in the new version of the iOS operating system next year, we should be able to see these features!
However, Apple has not yet reached a conclusion on how to apply LLM in products.
As we all know, Apple has always advertised its protection of user privacy, so in the realization of various functions, it is more inclined to run offline on the device rather than on the cloud server.
According to people familiar with the matter, the number of parameters of "Apple GPT" has exceeded 200 billion. Running such a large model requires not only powerful computing power, but also sufficient storage space.
Obviously, these requirements are a little too much for a small iPhone.
In this regard, Google's PaLM 2 has set a good precedent-the model has been tuned into four different scales, one of which can be used offline on the device.
Is Apple becoming "another Google"?
Speaking of the team, Giannandrea originally joined Apple to integrate more AI into Apple's software, such as Siri.
After being slapped in the face by the glory of ChatGPT, he finally dispelled his concerns about AI chatbots.
Thankfully, Giannandrea made at least one smart decision — he wanted to make Apple more “Google.”
Therefore, Apple employees are given a high degree of freedom and flexibility to conduct various research and publish papers. Hence, the Foundational Models team exists.
You must know that Apple had many restrictions on this before, and thus lost a lot of talents.
Another reason why Apple has become more "Google" is that after Giannandrea joined Apple in 2018, he hired many key Google engineers and researchers.
In addition, he also promoted Google's cloud services (including TPU chips developed by Google) within Apple to train models for Siri and other products.
Daniel was hired from Google
Apple’s team is filled with talented people.
The predecessor of Foundational Models was a team led by Dutch computer scientist Arthur Van Hoff.
Van Hoff was an early member of the Sun Microsystems team, the famous team that created Java in the 1990s.
In 2019, Van Hoff joined Apple, where he was responsible for developing a new version of Siri (internal codename Blackbird), but Apple abandoned this version. Later, he led the team to focus on LLM.
Initially, the team only had a handful of employees. Most notable are two British researchers from Oxford University, Tom Gunter and Thomas Nickson, who worked on NLP.
In 2021, Ruoming Pang joined Apple to help train LLM.
Unlike other researchers, he was given special permission to stay in New York, where Apple hopes to establish an outpost of the machine learning team.
Ruoming Pang has won widespread attention in the industry with his research on neural networks. Examples include how neural networks work with mobile phone processors, and how parallel computing is used to train neural networks.
A few months later, Apple poached Daphne Luong, a former Google AI executive, to oversee Van Hoff's team and Samy Bengio's. The latter is also poached by Apple from Google in 2021.
Later, there seemed to be some changes within the team, and Pang took over the Foundational Models team. Van Hoff is on indefinite leave this year.
However, according to the latest LinkedIn information, Van Hoff left in August this year.
Arthur van Hoff
Another former Apple multimodal research team leader, Jon Shlens, repeatedly jumped between "Apple-Google".
In 2012, Shlens joined Google as a senior research scientist for 11 years and 6 months.
At the end of 2021, he moved to Apple to be responsible for long-term machine learning research focusing on multimodal learning.
Less than 2 years later, Shlens returned to Google.
According to The Information's analysis, his new team at Google DeepMind is also inextricably linked to Google's upcoming Gemini model with multimodal capabilities.
Server, Google is also preferred
The reason why Apple recruited Pang is that the company has become more and more aware that LLM is very important in machine learning.
Insiders broke the news that after OpenAI released GPT-3 in June 2020, the employees of Apple's machine learning group started arguing, asking the company to allocate more funds to allow them to train the model.
It is reported that in order to save costs, Apple executives have always encouraged engineers to use cheaper Google cloud computing services instead of Amazon.
Because Google is the default search engine partner of the Safari browser, the price of Google cloud services will also be lower for Apple.
Of course, cooperation belongs to cooperation, and Apple has never stopped poaching people from Google and Meta's AI team.
According to statistics, since AXLearn was uploaded in July, at least a dozen members of Apple's machine learning team have contributed to the project on GitHub. Seven of them previously worked at Google or Meta.
Will Apple also be "open source"?
Interestingly, under the influence of Ruoming Pang, the Foundational Models team quietly uploaded AXLearn, the machine learning framework used for training Ajax GPT, to GitHub in July this year.
AXLearn, based on Google's open source framework JAX and accelerated linear algebra XLA, can be used to quickly train machine learning models and is optimized for Google's TPU.
Project address: https://github.com/apple/axlearn
Specifically, AXLearn takes an object-oriented approach to solving software engineering challenges that arise when building, iterating, and maintaining models. Users are able to compose models from reusable building blocks and integrate with other libraries such as Flax and Hugging Face transformers, among others.
In addition to supporting the training of models with tens of billions of parameters on thousands of accelerators, AXLearn also supports a wide range of applications including natural language processing, computer vision, and speech recognition, and includes the baselines required for training SOTA models configuration.
If we compare Apple's Ajax GPT to a "house", then AXLearn is the "blueprint", and JAX is the "pen and paper" used to draw these blueprints. However, Apple did not disclose the data used to train the model, that is, the "building materials."
However, we don't know why Apple released AXLearn publicly, but generally speaking, it is hoped that other engineers can also improve it.