VisCPM
is a family of open-source large multimodal models, which support multimodal conversational capabilities ( VisCPM-Chat
model) and text-to-image generation capabilities ( VisCPM-Paint
model) in both Chinese and English, achi eving state- of-the-art performance among Chinese open-source multimodal models. VisCPM is trained based on the large language model CPM-Bee with 10B parameters, fusing visual encoder (Q-Former) and visual decoder (Diffusion-UNet) to support visual inputs and outputs. Thanks to the good bilingual capability of CPM-Bee, VisCPM
can be pre-trained with English multimodal data only and well generalize to achieve promising Chinese multimodal capabilities.
VisCPM
is a open source multimode It is a large-scale model series that supports Chinese-English bilingual multimodal dialogue capabilities ( VisCPM-Chat
model) and text-to-image generation capabilities ( VisCPM-Paint
model), reaching the best level among Chinese multimodal open source models. VisCPM is trained on the CPM-Bee (10B), a large language model with tens of billions of parameters, and integrates a visual encoder (Q-Former) and a visual decoder (Diffusion-UNet) to support the input and output of visual signals. Thanks to the excellent bilingual ability of the CPM-Bee base, VisCPM
can generalize and achieve excellent Chinese multimodal ability only through English multimodal data pre-training.
💫 Excellent bilingual performance in Chinese and English : Thanks to the excellent bilingual ability of CPM-Bee, the language model base, VisCPM has achieved outstanding results in multimodal dialogue and text-to-image generation in Chinese and English.
Visit Official Website