In conjunction with a number of hospitals, Microsoft has developed a biomedical model that can read CT images
Recently, Microsoft researchers demonstrated a model called LLaVA-Med, which is mainly used in biomedical research. With the help of this model, the pathological condition of the patient can be inferred from CT, X-ray pictures, etc.
In order to train this AI model, Microsoft researchers cooperated with a number of hospitals to obtain large-scale data sets corresponding to biomedical image text, including chest X-ray, MRI, histology, pathology and CT images, etc., covering a relatively comprehensive range.

It is reported that the model finally has "excellent multimodal dialogue ability" and "LLaVA-Med is ahead of other advanced models in the industry in some indicators on three standard biomedical data sets used to answer visual questions".
Microsoft used GPT-4, based on the Vision Transformer and Vicuna language model, to train the LLaVA-Med model on eight Nvidia A100 GPUs, which contains "all the pre-analysis information for each image" to generate questions and answers about the image.
During the learning process, the model mainly revolves around describing image content and elaborating biomedical concepts. The question answering ability of this model is expected to realize the assistant vision of "answering questions about biomedical images in natural language", which will greatly improve the efficiency and scientific research level in the field of biomedical research.

However, Microsoft researchers also said that the model currently has certain shortcomings, such as the common false examples and poor accuracy of large models, and the quality and reliability of the model need to be further improved. In the future, the research team will continue to work hard to improve the performance and reliability of the LLaVA-Med model for commercial biomedical applications.