Ali version ChatGPT is coming soon
“Everyone has a big model of their own—is this a demonstration of imagination, or is it the real direction of technology development?…”
This week, Ali AI will have a big move has spread. Since OpenAI released ChatGPT at the end of November last year, large language models have become the focus of the entire AI community. Major foreign manufacturers either integrate ChatGPT in their own applications (such as Microsoft's new Bing), or launch large-scale benchmarking models (such as Google Bard). There has also been an upsurge in research and development of ChatGPT in China, and large text dialogue models and products such as ChatYUAN, MOSS, and Wenxin Yiyan have appeared successively.
The first thing revealed is that Ali’s large-scale model joint project team (hereinafter referred to as the joint team) has trained a large-scale model with personality, and cooperated with talk show actor Niao Niao to train her digital clone—Niao Niao Fen Niao. In this large-scale technical exchange for consumer-grade terminals, the heart of the machine participated in the evaluation of several important experiences:
- After waking up once, you can have an uninterrupted free voice conversation with "Niao Niao Di Niao".
- Niao Niao Fen Niao is very anthropomorphic, possessing the timbre, tone, and expression of Niao Niao.
- Like other chat robots, you can learn encyclopedic knowledge from Niao Niao Fen Niao, and you can also get inspiration when creating.
Niao Niao is a talk show actress. She won the annual runner-up in "Talk Show Conference Season 5". Her social terror-style talk show style has gained a large number of fans, and she is known as the god of text and the Internet mouthpiece.
The heart of the machine also learned from this joint team that its research is characterized by two aspects: one is how to make large models serve personal terminals and family scenes safely and efficiently; the other is that AIGC is driven by multiple modes, including text, images, etc. , voice, video.
Since the entire technical interface is demonstrated through the Tmall Genie terminal, we should be able to expect that the Ali version of ChatGPT should integrate a variety of industrial scenarios, one of which is to promote smart assistants and consumer terminal industries.
Niao Niao Fen Niao will tell you jokes, talk about classics, and tutor writing for you
It was a mule or a horse that came out for a walk. After getting the Tmall Genie speaker, the heart of the machine immediately experienced the chatting ability of a bird and a bird.
Let her declare herself first.
Hello, I am Tmall Elf Niao Fen Niao, a talk show actor...
Since I am a talk show actor, let me tell you two jokes:
This is the story of a girl losing weight at the gym...
When you are unhappy, Niao Niao Fen Niao can also be considerate and comfort you.
After losing my phone, Niao Niao Fen Niao comforted me like this...
For the encyclopedic knowledge you want to know, Niao Niao Fen Niao can also respond fluently.
Let me tell you about Sangu Maolu and Zhuge Liang...
Niao Niao Fen Niao can also provide writing guidance for students.
Birds and birds meet in the eyes of birds...
After some experience, Niao Niao Fen Niao exceeded expectations, just like communicating with Niao Niao himself, very interesting ! In addition, it helps users in multiple rounds of dialogues of knowledge enlightenment, empathy, and creative assistance. However, the current ability is not stable. For example, let it play a piece of Jay Chou's music, but it can only say a few lyrics, and it has not jumped to the ability to play music. I believe that the subsequent version will show better results.
Personalization: An Important Direction of Large Model Research
In recent years, large models have performed better and better on general knowledge tasks, and large models trained on very large-scale corpus have surpassed the average human level in tasks such as knowledge evaluation. The emergence of large dialogue models such as ChatGPT makes people feel the intelligence level of AI, and its ability to answer human questions is amazing. However, the current big model of general education seems to lack personality, and when asked questions about its preferences, opinions on something, etc., it does not respond so well.
Therefore, injecting personalization on the basis of the mainstream general education model is an important direction of exploration. From the perspective of the evolution of related research , this kind of personalized large model focuses on the consistency of human settings, dialogue style, logical consistency and three views of dialogue in multiple rounds of dialogue in dialogue scene training, as well as personalized dialogue with preferences. This means that they are given corresponding role settings, including identity, gender, name, personality, preferences, etc., and have the ability to empathize. Aiming at these four subdivision directions of personalized large models, academia and industry have published some related views and papers.
Regarding the consistency of human settings in multiple rounds of dialogue, the research team of Harbin Institute of Technology proposed in the AAAI 2019 paper [1] to use natural language inference (NLI) technology to solve it, and use NLI signals from response-role pairs (response-persona pairs) as dialogue Rewards for the generation process. Regarding dialog style, Meta leverages three controllable generative methods (i.e., retrieval and style transfer, plug-and-play, and conditional generator fine-tuning) in the paper [2] to control the style of open-domain dialog. Regarding the three perspectives in the dialogue process, the University of Edinburgh and DeepMind proposed in the paper [3] that different three perspectives should be given to the large dialogue model.
Finally, with regard to personalized dialogue with preferences, South China University of Technology and Tsinghua University jointly proposed a large-scale Chinese personalized and emotional dialogue data set CPED based on film and television characters in the paper [4], which contains multiple information related to empathy and personal characteristics. source knowledge (gender, personality traits, emotions, etc.). The study also highlights the role of speaker personality and emotion in conversational AI.
For Ali, this field can be traced back to the paper [5] published by Nanyang Technological University at the top conference EMNLP 2020 in 2020, which conducted in-depth research on role-based empathy dialogue models. But this paper does not seem to be the same technical direction of the personalized large models seen today.
In order to make the large model better fit the characteristics of the characters, the Ali joint team proposed for the first time the direction of the four-in-one personalized large model of "knowledge, emotion, memory, and personality". It is believed that relevant research papers should be on the way.
Birds and birds must not only hear clearly, but also speak like
Dialogue products that incorporate personalized large-scale model capabilities will give answers in line with identity and personality when answering questions, improving user satisfaction. Niao Niao Fen Niao was trained by the joint team in the personalized large model, and it took only 15 days to complete the engineering link.
The whole process is divided into four steps: large-scale language training, knowledge and tool enhancement, personalized dialogue enhancement and human feedback enhancement .
The first step is large-scale language pre-training, which uses a hierarchical training method to simulate human learning, gradually increasing the difficulty from simple knowledge to professional and complex knowledge. As far as the bird is divided into birds, the joint team first uses a large-scale corpus for pre-training, so that the large model can learn enough world knowledge, which also includes the public information of the bird.
But after the first step, I found that there will be a lot of new and outdated knowledge every day, so it is not a good choice to write down all the knowledge. The joint team chose to use tools such as search engines to make the large model capabilities stronger, such as using search engine input to answer questions more accurately and in a timely manner based on the understanding and induction of search results. In this way, Bird by Bird can answer the latest information, news, etc.
Then the third step is to carry out personalized dialogue enhancement on the basis of knowledge and tool enhancement. Here, Niao Niao Di Niao not only needs to learn multiple rounds of dialogue and heuristic dialogue, but also has good consistency and coherence in multiple rounds; it is also endowed with personality tags, and the joint team marked a small amount of Niao Niao corpus for personalized enhancement And tuning, to achieve a fast character engraving.
In the end, whether it looks like a bird or not, human feedback is more direct and real. The joint team uses human feedback augmentation for role reinforcement, checking which of multiple answer candidates is more or less like a bird, and which is right or wrong. These feedbacks and annotations correct the bias of the personalized dialogue model and positively enhance it in a more bird-like direction. However, the current joint team is only based on members' feedback to enhance, and in the future, it will be open to more Niao Niao fans to collect more feedback, so that Niao Niao Di Niao is more realistic.
Voice interaction is a systematic project. For Tmall Elf Niao Fen Niao, it is not only necessary to hear clearly, but also to speak like it. After training Niao Niao to separate the birds, the joint team focused on improving its conversational AI experience in terms of hearing, timbre, and writing style .
First of all, let the birds and birds hear what people say clearly in the conversation. The joint team adopted the cat ear algorithm, which is to accurately identify the position by listening to the sound . On the one hand, echo cancellation is performed, and the echo generated by the device playback will greatly interfere with the conversation. The joint team combines deep learning methods with traditional AEC, multi-channel stereo echo cancellation, to ensure that the device only hears what the person is saying. On the other hand, it is directional pickup. With the help of the microphone array in the device, it can accurately identify the position of the speaker when it is awakened, and accurately capture the human voice. At the same time, use noise reduction to eliminate non-human voices or voices of distant speakers.
Secondly , the timbre of Niao Niao Fen Niao is close to that of Niao Niao, thanks to the acoustic model developed by Ali Dharma Academy . The traditional vocal customization scheme is complicated, and it may be necessary to collect 20 hours of effective recording data and customize the algorithm on a yearly basis, which is too expensive. The sound of traditional speech synthesis is also very mechanical, and it sounds like the voice of a robot. The KAN-TTS customized solution of Bodhidharma Academy only needs to collect 1 hour of effective bird recordings, and it takes about a week from recording to training completion and model launch . The anthropomorphic sound presented is more natural, close to the timbre of a bird.
Lastly is the text style. Niao Niao Fen Niao not only wants to be close to Niao Niao in terms of timbre, but also to follow her expression style . This can be used to set the character style for the dialogue model through the way of personality tags, and the cheerful character of the character will give people a happy and optimistic image as a whole. At the same time, further constraints are placed on the description of the person, such as what is the name, how old is it, what does it do, and where is it from. The joint team selected tag words such as talk show actor, Inner Mongolian, deep, humorous, introverted, and post-90s for the birds.
In addition, in the interaction with Niao Niao Fen Niao, it was found that when a person speaks, it will send out some successor words, such as "Well, I am here" and "Let me think about it." When it is halfway through answering the previous question, we can also interrupt it and ask the next question directly. The overall dialogue delay is very low, which is basically close to the dialogue between people. This is all thanks to the duplex dialogue system that has been running on the cloud for a long time, which has greatly improved the dialogue experience . Good listening skills, incremental dialogue and low latency have become several notable features of this duplex dialogue system that are different from traditional dialogue.
On the whole, the joint team is committed to the personalization of the four-in-one large model, from asking questions to generate Query, using the ASR cat ear algorithm to accurately convert it into text, and the text generates personalized dialogue replies through the generalized large model, and finally personalized TTS Give the personalized tone (bird bird) answer. This large model can realize the four in one of knowledge, emotion, memory and personality .
The joint team also hopes that Niao Niao Fen Niao has long-term and short-term memory. In the short term, it must be able to remember the topics discussed in the past three to five rounds and reply based on these; Preferences in the conversation, what to do, what to eat and other information are stored, so as to better understand users and generate empathetic and stylized conversations during future conversations.
In this way, Birds of Birds currently appears to be a large model for deployment rather than a large model for roleplaying. This also seems to imply future-oriented exploration. If Niao Niao can have its own large model, can each family independently deploy its own AIGC intelligent service?
References:
[1] Haoyu Song etc. Generating Persona Consistent Dialogues by Exploiting Natural Language Inference
[2] Eric Michael Smithattitude etc. Controlling Style in Generated Dialogue
[3] Atoosa Kasirzadeh etc. In conversation with Artificial Intelligence: aligning language models with human values
[4] Yirong Chen etc. CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI
[5] Peixiang Zhong etc. Towards Persona-Based Empathetic Conversational Models