GPT-4 to Launch Mid-March 2023 with Multimodal Capabilities
Andreas Braun, CTO of Microsoft Germany, has confirmed that GPT-4 will be launching during the week of March 9, 2023. This new version of the OpenAI language model will feature multimodal capabilities, allowing it to process various types of input including video, images, and sound.
Multimodal Large Language Models
GPT-4 is set to be a major breakthrough in natural language processing. One of the big takeaways from the announcement is that GPT-4 will be multimodal, meaning it can work with input in various forms such as text, speech, images, and video.
This is a significant improvement over GPT-3 and GPT-3.5, which only dealt with text. According to a German news report, GPT-4 may be able to operate in at least four modalities: images, sound (auditory), text, and video.
Dr. Andreas Braun, CTO Microsoft Germany, has been quoted as saying, "We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos..."
The reporting lacks specifics for GPT-4.
Microsoft Director Business Strategy Holger Kenn explained that multimodal AI is about translating text not only into images, but also into music and video.
Another interesting fact is that Microsoft is working on "confidence metrics" in order to ground their AI with facts to make it more reliable.
In early March 2023, Microsoft released a new multimodal language model called Kosmos-1. This news was apparently underreported in the United States, but it was covered in detail by German news site Heise.de.
According to Heise.de, the pre-trained model performed well on various tests related to image classification, image-related question answering, automated image labeling, optical text recognition, and speech generation tasks. The model also demonstrated an ability to reason visually, without relying on language as an intermediate step.
Kosmos-1 integrates the modalities of text and images, but it falls short of GPT-4, which includes a third modality, video, and may also incorporate sound.
Works Across Multiple Languages
According to Microsoft, GPT-4 is capable of working across all languages. It can receive a question in one language, such as German, and answer in another language, such as Italian. While this may seem like a strange example, the breakthrough lies in the model's ability to pull knowledge from different languages, allowing it to transcend language barriers. If the answer is in Italian, GPT-4 will recognize that and provide the answer in the language in which the question was asked. This is similar to the goal of Google's multimodal AI, MUM, which can provide answers in English even if the data only exists in another language, such as Japanese.
There are currently no announcements regarding where GPT-4 will be utilized, but Azure-OpenAI has been mentioned as a possibility.
Google is trying to catch up with Microsoft by incorporating competing technology into its own search engine. This development highlights the perception that Google is falling behind and lacks leadership in consumer-facing AI.
Google already uses AI in several products, such as Google Lens and Google Maps, to create an assistive technology that helps people with various tasks.
Microsoft's approach is more visible, which reinforces the image of Google struggling to keep up.