FLARE-MedFM/FLARE-MLLM-2D
Viewer • Updated • 46.8k • 131 • 3
This repository provides a PaliGemma2 model fine-tuned for comprehensive medical image question answering and analysis. The model is based on google/paligemma2-10b-pt-224 and was trained on the FLARE 2025 medical multimodal dataset, which includes 19 medical imaging datasets, 50,996 images, and 58,112 question-answer pairs across 8 imaging modalities.
| Task | Metric (Description) | Value | #Examples |
|---|---|---|---|
| classification | balanced accuracy | 0.4723 | 3513 |
| multi-label classification | F1 score (micro) | 0.5040 | 1446 |
| detection | F1 score (IoU>0.5) | 0.3446 | 255 |
| instance_detection | F1 score (IoU>0.5) | 0.0028 | 176 |
| counting | mean absolute error | 295.6500 | 100 |
| regression | mean absolute error | 16.5035 | 100 |
| report_generation | GREEN score | 0.7072 | 1945 |
from transformers import PaliGemmaProcessor, PaliGemmaForConditionalGeneration
from peft import PeftModel
from PIL import Image
import torch
base_model_id = "google/paligemma2-10b-pt-224"
model_id = "yws0322/flare25-paligemma2"
processor = PaliGemmaProcessor.from_pretrained(base_model_id)
base_model = PaliGemmaForConditionalGeneration.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
load_in_4bit=True
)
model = PeftModel.from_pretrained(base_model, model_id)
image = Image.open("chest_xray.jpg")
question = "What are the key findings in this chest X-ray?"
image_token = "<image>"
prompt = f"{image_token * processor.image_seq_length}{processor.tokenizer.bos_token}Analyze the given medical image and answer the following question:\nQuestion: {question}\nPlease provide a clear and concise answer."
inputs = processor(images=image, text=prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
If you use this model in your research, please cite:
@misc{flare25paligemma2025,
title={FLARE25-PaliGemma2},
author={Yeonwoo Seo},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/yws0322/flare25-paligemma2}
}
@misc{paligemma2-base,
title={PaliGemma2: Multimodal Vision-Language Model by Google Research},
author={Google Research},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/google/paligemma2-10b-pt-224}
}
Model uploaded on 2025-06-03
Base model
google/paligemma2-10b-pt-224