HomeAI News
Graphic record|Li Jingmei, partner of Lanzhou Technology: AIGC technology and application practice based on pre-training model

Graphic record|Li Jingmei, partner of Lanzhou Technology: AIGC technology and application practice based on pre-training model

March 18th, 2023
View OriginalTranslated by Google

In the field of AIGC, human beings should continue to improve creativity, don't stop, don't stop creating just because of AI, and AI will also help humans create better value by improving the efficiency of the entire industry.

On January 6, 2023, the first "Nuggets Future Conference" jointly sponsored by the Rare Earth Nuggets Technology Community and Intel was successfully held in Beijing.

At the conference, Li Jingmei, partner and chief product officer of Lanzhou Technology, introduced the Mencius lightweight pre-training model of Lanzhou Technology, which has been open sourced in multiple communities such as GitHub, Hugging Face, and ModelScope, providing nearly 20 models, involving reading Enterprises can download and use various aspects such as understanding, text generation, multimodality, finance, etc.

In addition, Li Jingmei further demonstrated the practical application of the AIGC technology based on the pre-trained model through three scenarios: marketing copywriting, literature-assisted writing, and plot-based illustrations. She said that humans should continue to improve their creativity, and AI will help humans create value better by improving industrial efficiency.

The following is the full speech of Li Jingmei, partner and chief product officer of Lanzhou Technology:

Pre-trained models mark NLP entering the stage of industrial implementation

Lanzhou Technology was officially established in June 2021. It is a start-up company that makes NLP technological innovations based on pre-trained models and drives them to land in various vertical fields. Today, we will also focus on the pan-Internet field, especially the popular AIGC, to see what the big model can do in specific fields.

On top of technological development is a simple timeline. In the 1980s, it was an era based on symbols, statistics, and other related models. By 2010, it began to enter the era of deep learning. In the field of NLP natural language processing, one of the biggest technological breakthroughs was the emergence of Transformer technology in Google in 2017, and everything began to change. In 2019, reading comprehension AI surpassed humans for the first time, and some large models appeared later, including OpenAI GPT-3, which is currently widely used, and when GPT-4 is expected to appear in 2023, etc.

From the earliest computing intelligence, such as big data, cloud computing, etc., are just needed, and perceptual intelligence, such as vision and voice, are very mature. But why talk about cognitive intelligence now? Cognitive intelligence is what you see, hear, and understand? Do you understand? Is it possible to think? able to make decisions? …Nowadays, when it comes to AIGC, it is the content created by AI. Can you see it, understand it, understand it, and create content even more. Therefore, now is the stage of evolution from perceptual intelligence to cognitive intelligence, and even creative intelligence.

The large model mentioned this time is a pre-trained model. What is pre-training? It is to do unsupervised training on the massive data publicly available on the Internet, so as to make it easier for the model to learn specific tasks. Instead of relying on people to do the traditional labeling. The advantage is that the level of the model has been raised to a very high level. When it is implemented, it will be based on the data of a certain vertical field or the data of a certain customer. It only needs to be fine-tuned, and the implementation will be very agile. At this time, the data that the customer needs to provide is relatively fast. It is faster to make a model from 0 in traditional machine learning. That is to say, the pre-training model marks that NLP has entered the stage of industrial implementation. It can be used for cost calculation, and it can also be used as a business, and customers can also see the value.

Lanzhou Technology has a technology brand called Mencius, which focuses on Chinese customers, markets and Chinese. Based on the basic model, it has developed a Mencius lightweight pre-training model. Simply put, three things are done:

One is model optimization. Performance optimization, task construction, etc. have been done on the model architecture;

The second is knowledge enhancement. Although it is a pre-training, it requires certain domain knowledge in different fields, so knowledge enhancement is based on knowledge graphs, linguistics, etc.;

The third is data enhancement. Although this is a general pre-training, there will still be some tasks, such as reading comprehension, classification, long and short text understanding, etc., and related data enhancement will still be done in a certain large range.

Therefore, the Mencius lightweight pre-training model is not only a model, but also a series of several models. In 2022, Mencius's lightweight technology will reach a new level: it will reach the top of the ZeroCLUE and FewCLUE lists, which will make the Mencius pre-training model lightweight, and the actual size of the model will require a small amount of data, and it will be economically and quickly adapted.

In addition, the Mencius lightweight multi-tasking model has been open sourced in multiple communities such as GitHub, Hugging Face, and ModelScope, providing nearly 20 models, involving reading comprehension, generation, multi-modality, finance, etc., and enterprises can Download and use.

AIGC applications for three major scenarios

Although Lanzhou Technology has large-scale models, NLP and other related technologies, it pays great attention to landing on the premise of vertical fields or scenarios. This sharing focuses on marketing copywriting, literature-assisted writing, and AIGC applications based on three scenarios:

1. Literary auxiliary writing scene. During the cooperation between Lanzhou and Chinese Online, 17K Novels and other online platforms, we heard many practical needs. For this reason, Lanzhou also provides 4 types of services:

continue writing. The user writes the previous text at the beginning, and the machine continues to write, which is a very common interactive writing;

Generated based on keywords. For example, the user's Chinese is not very good, but he has come up with a lot of rhetoric, which must be put into the composition, but can't be strung together into sentences. Then based on the keywords, AI can help to polish, and the grammar is also very smooth;

Solid rendering. This is a just-needed requirement in web text creation. The entity here refers to a character, object, etc., for example, a modern man, an ancient beauty, or a magic weapon. The entity is described based on keywords;

Custom templates. Based on user-defined templates, you can fill in the blanks, fill in words and make sentences.

In fact, the capabilities provided by Lanzhou have already been implemented on the Chinese Online and 17K novel platforms, empowering platform authors to write. For example, Lanzhou's ability to open up in the author's writing interface of 17K novels is a very cross-border application, which is based on the ability of text generation in the Mencius pre-training model and the implementation of relevant corpus using web texts. At the same time, The delivery is also very light, just an interface, and then integrated into the 17K author platform.

Lanzhou also made a small To C program called Panda Novelist, which is actually Story Solitaire. As an initiator, the user must first create a story, conceive the plot and give a story outline, and then give some keywords to randomly generate text. If the user belongs to the social cow type, he can also publish the novel to the square, and anyone who wants to come and write it down together. It can be said that Lanzhou's AIGC ability in literature-assisted writing has been put into a small program, and anyone who is interested can try it.

2. Marketing copywriting scene. Marketing copywriting is actually an early landing application of Lanzhou, which currently focuses on the fields of beauty and automobiles. Users can choose from two fields. The system includes writing content, generating titles, rewriting, etc., but all of them require users to give certain keywords. At this point, you will find that in terms of models and technologies, everything you write is similar. The key is that in different fields, Lanzhou has different data, so the keywords are different, and it can be quickly adapted to different applications. field, but the technology behind it is interlinked, and there is no need to redo a completely different technology system. For example, in the field of automobiles, some professional knowledge graphs may be involved, such as brands, models, etc. Of course, this is just an experience, and there are still shortcomings to be pointed out, but if there is a need for customized cooperation, the effect will definitely be better than online It is better to try the demo.

A year ago, Lanzhou cooperated with Shushuo Story, with Shushuo Story as the front desk and Lanzhou as the engine behind it, creating an automated writing product called http://content-note.com Smart Copywriting, which is divided into " Choose a template-enter keywords-generate results" in three steps. In addition, there is Lanzhou Thesis Writing Assistant (LPA), which mainly generates complete sentences based on keywords provided by users, and generates the next sentence based on the previous text for the author's reference. Now Lanzhou also extends it to English papers on artificial intelligence Writing, used by researchers participating in top conferences in NLP, ACL and other fields. Compared with ChatGPT, the written content is more paper-like.

The above two scenarios are all text generation applications. You can see that Lanzhou Technology is a platform, and its bottom layer is a large model, algorithm, technology, data, etc. When it is implemented in a specific field, there will be data in different vertical fields , At this time, there are branches, and some sentences related to text generation, which will be summarized into the Lanzhou text generation platform in the end.

At the same time, the entire text generation engine function has 6 characteristics:

Multi-attribute controllable text generation, achieving the purpose of AI controllability through control attributes such as keywords, topics, cloze, and entity rendering;

Diversified text generation forms, based on continuation of existing text, and text expansion based on keywords, titles, and table data, are more flexible to use;

Based on the text generation of the knowledge map, users can customize the knowledge map to improve the factual correctness of the generated content;

Content and style customization, build exclusive text generation engine based on user data;

Text automatic evaluation system, which can evaluate text correctness, logical coherence, etc.;

Multi-industry support, such as marketing, finance, news, medical care, education and many other industries.

3. Match the plot based on the plot. Although Lanzhou Technology does not focus on making pictures, there are many similarities between text and pictures, especially after the bottom layer of the model is vectorized, whether it is text, voice or pictures, comparability and similarity become very easy. On the basis of the text-image generation model Stable Diffusion, Lanzhou Technology has further implemented the vertical field. There are several models in it, including a model for image encoding, decoding and intermediate denoising. As shown in the picture below, there is a Pegasus on the left, and a flying zebra on the right. The words added in the middle are zebras flying in the sky. Finally, the extra "black stripes" on the Pegasus become zebras. Encoding and entering into the denoising model is probably such a principle.

So, what did Lanzhou Technology do on the basis of Stable Diffusion? The first is text-to-picture generation, for example, to match a paragraph in a novel with pictures; the second is to generate pictures from pictures, for example, a child wants to draw such a picture: a snail with a house on its back, There is a rainbow in the sky. But children are not good at drawing, then, based on the children's paintings, there may be AI-generated pictures with different styles; finally, the pictures are controlled by text, for example, Mr. generates a picture of "a little girl in a skirt". On this basis, It can also be singing, dancing, playing the violin, etc., and the generation of the original picture can be controlled again through text.

The above are Stable Diffusion’s open-source models and some common scenarios. There are also areas where Lanzhou Technology needs to make efforts in terms of productization, toolization, and standardization, and many scenarios are close to being released to the public.

Then, Lanzhou AI text image generation technology has several main features:

Chinese optimization is more controllable;

Intelligent prompt generation. As we all know, Stable Diffusion is based on prompts (prompts), and ordinary users can only speak natural language. How to translate it into a better model, an understandable prompt, and possibly adding content that the user did not input, These are one of the tasks that Lanzhou needs to do;

Consistency in concept construction. The concept here may be a character, magic weapon or other objects in a novel, etc. It is also a big challenge to keep them consistent in a novel story;

Personalized customization training, AI cannot be customized randomly, such as characters in novels, the characters customized by AI must remain the same from beginning to end;

Inference acceleration improves the experience, including reducing costs. After all, commercial payment is valuable to the business.

At present, Lanzhou Technology also has some preliminary research results, such as the style control of controllable text and image generation. For example, to generate a Chinese painting style, Stable Diffusion is also used, but some training will be done in the middle to feed back, that is There is a Discriminator discriminator, in which people participate in the selection of generated pictures, and then tell the model which of the pictures generated by the model conform to the Chinese painting style and which ones do not, that is, the correct results are fed back to Stable Diffusion, so that it is trained When the model is used for reasoning, the effect will be close to the desired style. For another example, for a picture of a girl, you can also specify the style to generate requirements for different styles such as two-dimensional, Chinese painting, and cyberpunk.

Let’s take image control again. For example, the characters in a novel story must grow into the description, given the target image, but it is also afraid that there will be fitting during training, and some generalized images need to be inserted into the model. For this reason, Lanzhou adopts the DreamBooth method, that is, take a character to represent a certain boy image for training, and when inference, just tell the model to infer the boy of this character image, and you can get the desired effect . For example, when a boy walks into a convenience store, he will appear in scenes such as asking the clerk, picking up products, and checking out. Of course, there are still some flaws that need to be continuously improved.

Finally, there are still many things that Lanzhou Technology can do in the future, and it also attaches great importance to the actual implementation of application scenarios. Yes, but if it is thousands of words, the controllability is not so strong; the second is chapter and context consistency, such as the story solitaire in Panda novels, which involves thousands of chapters or sections, and the integration of the previous summary into the previous one Or the contents of the previous section are places that need to be explored; the third is common sense and factual rationality, which may require the introduction of knowledge graphs, such as allowing machines to understand astronomy and geography, and rational descriptions of time, place, and character relationships; The fourth is personalized and agile customization. For example, if a user has a fixed character image or an illustration, how to do personalized customization requires follow-up exploration.

All in all, in the field of AIGC, human beings should continue to improve their creativity, don't stop, and don't stop creating because of AI, and AI will also help humans create better value by improving the efficiency of the entire industry.

Reprinted from 澜舟科技View Original


no dataCoffee time! Feel free to comment