AI points of the week: Can see pictures and sing, Microsoft and Meta make AI more and more versatile
This week, Microsoft, Meta, and OpenAI all have new actions. The application of AI in our daily life is no longer limited to drawing and writing. An AI that can watch and sing is also entering our lives.
Bing chat to enable map recognition function
Can understand memes and teach
Recently, some Reddit netizens discovered that their Bing chat has added an entry for uploading pictures.
So various "folk tests" began. Some netizens took an arithmetic problem and asked Bing Chat to do it. Of course, it can't be difficult:
Someone shows Bing Chat a photo of redness and swelling on their hands, and it can also try to "see a doctor":
The most tricky netizen gave it a meme of machine learning, and Bing Chat really seriously introduced where the "smile" is:
Previously, Microsoft stated that it would add multi-modal support to Bing Chat. Judging from the feedback from netizens, the "picture recognition function" has been gradually developed and tested on a small scale. It is estimated that more people will use it in the near future. Check out this amazing new feature.
Meta open source MusicGen
Turn text and melodies into new songs
Over the weekend, Meta open-sourced MusicGen, which generates new pieces of music based on text/audio cues.
The research team used 20,000 hours of authorized music for training. The final model is not only efficient, but the final effect of the generated music can also match the "music tone" set by the text prompt, or the melody of the audio prompt.
Currently, MusicGen, which can compose music, has been open-sourced on Github and allowed for commercial use.
OpenAI update model
It is more convenient for developers to call functions
On Wednesday, June 14th, OpenAI announced a series of updates, including the GPT -4-0613 and GPT-3.5-Turbo-0613 models with function calls , which developers can now use to connect models with external tools. , API to achieve more uses.
OpenAI stated in the update statement that they also plan to gradually allow more users who are already on the invitation list to experience GPT-4, and announced that the GPT -3.5-Turbo-16k model that can support 16K context will be launched for all users immediately. Compared with the old model that is about to be deprecated, users can now feed the new model with content up to 4 times the text length (that is, 20 pages of text) at a time, which greatly improves efficiency.
In addition, OpenAI has also lowered the price of the current model, ranging from 25% to 75% for each item. For developers, this series of updates is a good "Dragon Boat Festival gift".
"Baichuan Large Model" debut
7 billion parameter evaluation dominates the list
On June 15th (Thursday), Baichuan Intelligence formed by Wang Xiaochuan launched the first 7 billion parameter Chinese and English pre-training large model - baichuan-7B.
According to the official introduction, this model leads the well-known ChatGLM-6B in the evaluation of Chinese C-EVAL:
Also ahead of LLaMA-7B in MMLU's English test:
The two standard Chinese and English authoritative benchmarks have achieved the best results of the same size.
At present, baichuan-7B has been released on Hugging Face, Github, ModelScope and other platforms, and the model weight adopts a free commercial agreement. Tsinghua University and Peking University have taken the lead in using the model to promote research work.
360 Zhinao launched the "Wensheng Video" function in China
Presentation presentation of "creating something out of nothing"
On June 13 (Tuesday), 360 released the 360 Smart Brain Big Model, and demonstrated its cross-modal generation capabilities at the press conference, which can process multi-modal information such as text, images, voice, and video.
For example, after 360 Brain input unrealistic text prompts such as "panda rowing a boat", the four generated videos all well conceived the picture "out of nothing".

At present, 360 Brain has started the internal test application on the official website, and users can apply for the internal test invitation code through their mobile phone number.