June 13th, 2023
GPT-4 released in March this year has a high technical content in the use of multimodal functions.

It not only supports image input, but also makes the results of its answers more accurate and credible through the processing of memes, physics questions, and papers. Although GPT-4 has received widespread attention and praise from the industry after its release, it is a pity that OpenAI only stated that the image input function is still in the research preview stage and is not open to the public. In this context, some netizens on Reddit recently discovered that their Bing Chat account has added an entrance for uploading pictures.

Surprisingly, the entrance for uploading pictures has the function of recognizing pictures. As long as the picture is uploaded to Bing Chat, it will give the recognition result of the picture and the relevant answer.

It should be noted that although Bing Chat is currently open to all Microsoft accounts, it still needs to wait for verification for test eligibility.

Previously, Microsoft had revealed that Bing Chat was using the GPT-4 model, and mentioned in the big update log last month that it would add multimodal support to Bing Chat. Judging from the synthesis of these information, Bing Chat is likely to be gradually developed and opened to users for map recognition function testing.

A netizen tested Bing Chat extensively on Reddit and found that it performed quite well.

The first is GPT-4’s most eye-catching function of viewing memes. GPT-4 can not only understand the content in the picture, but also accurately analyze the jokes of memes like humans.

For example, for a meme, GPT-4 can easily see that charging an iPhone with an obsolete VGA-shaped port is ridiculous, making people laugh.

However, during the test, the netizen found that when uploading the same meme to Bing Chat for testing, Bing Chat did not recognize that it was a VGA interface, which led to it failing to understand the joke.

However, Bing Chat was still able to accurately identify the photo and give the cable's brand and more detailed information. This may be seen as an accidental error by Bing Chat, or an answer bias caused by more restrictions and adjustments made by Microsoft behind it in the actual application of GPT-4.

The netizen continued to upload another meme cartoon about machine learning, and tested and found that Bing Chat can accurately give the content and jokes in the picture.

How accurately can Bing Chat recognize images? For example, he uploaded a picture of characters from the Super Smash Bros. game, which has many characters lined up, and asked Bing Chat to identify all the characters one by one. It turned out that Bing Chat only managed to identify 7 of the 12 personas. It seems that Bing Chat still has a certain degree of difficulty for the problem of the second dimension.

OpenAI also showed a case where a webpage can be programmed by taking a drawing. A netizen drew a picture by hand and wanted to test the reaction of Bing Chat.

Next, let us omit the lengthy code link and directly look at the effect of this web page. It can be seen that the basic web form has been established.

Adding the image recognition function has added many uses to Bing Chat. After all, in real life, many contents (such as formulas and charts) are difficult to express clearly in words. At this time, the answer can be obtained by directly passing the image to AI.

For example, let it explain the process of crossing over of chromosomes during meiosis.

Or, let it act as a biology teacher and explain how nephron filtration works.

You can even have it act as an online doctor, simply diagnosing symptoms. Previously, ChatGPT has been developed to learn foreign languages ​​and practice oral English due to its excellent text understanding and expression skills.

If the visual recognition function is added, Bing Chat can act as a teacher in middle school, high school and even university to help students answer complex mathematical, physical and chemical problems.

If the picture recognition function is promoted, it may be possible to solve the problem of unbalanced educational resources to a certain extent.

In addition, users can also use it for basic medical diagnosis, saving the money and time required for medical treatment, reflecting the value of AI for the benefit of the public.

It should be noted that although Bing Chat can basically understand the content of the real world, its answers can only be used as a reference, not as professional advice.

Therefore, before fully opening the map recognition function to the public, Microsoft needs to do a lot of restrictions and debugging to ensure that the public will not have security problems due to mistrusting AI's answers.

In the short term, Bing Chat's image recognition function may still only be available to a small number of beta users. If you are interested in this, you can log in to your Microsoft account immediately to see if there is an additional picture icon on the chat bar of Bing Chat.