About VisCPM

VisCPM is a family of open-source large multimodal models, which support multimodal conversational capabilities ( VisCPM-Chat model) and text-to-image generation capabilities ( VisCPM-Paint model) in both Chinese and English, achi eving state- of-the-art performance among Chinese open-source multimodal models. VisCPM is trained based on the large language model CPM-Bee with 10B parameters, fusing visual encoder (Q-Former) and visual decoder (Diffusion-UNet) to support visual inputs and outputs. Thanks to the good bilingual capability of CPM-Bee, VisCPM can be pre-trained with English multimodal data only and well generalize to achieve promising Chinese multimodal capabilities.

VisCPM is a open source multimode It is a large-scale model series that supports Chinese-English bilingual multimodal dialogue capabilities ( VisCPM-Chat model) and text-to-image generation capabilities ( VisCPM-Paint model), reaching the best level among Chinese multimodal open source models. VisCPM is trained on the CPM-Bee (10B), a large language model with tens of billions of parameters, and integrates a visual encoder (Q-Former) and a visual decoder (Diffusion-UNet) to support the input and output of visual signals. Thanks to the excellent bilingual ability of the CPM-Bee base, VisCPM can generalize and achieve excellent Chinese multimodal ability only through English multimodal data pre-training.

  • 👐 Open source use : VisCPM is free to be used for personal and research purposes. We hope to promote the development of the multimodal large model open source community and related research through the open source VisCPM model series.
  • 🌟 Covers two-way generation of graphics and text : VisCPM model series fully supports the multi-modal capability of graphics and text, covering multi-modal dialogue (picture-to-text generation) capabilities and text-to-picture generation capabilities.
  • 💫 Excellent bilingual performance in Chinese and English : Thanks to the excellent bilingual ability of CPM-Bee, the language model base, VisCPM has achieved outstanding results in multimodal dialogue and text-to-image generation in Chinese and English.


Visit Official Website


Community Posts
no data
Nothing to display