PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders
Paper • 2603.25398 • Published • 3
How to use tue-mps/coco_panoptic_pmt_base_640_dinov3 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-segmentation", model="tue-mps/coco_panoptic_pmt_base_640_dinov3") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("tue-mps/coco_panoptic_pmt_base_640_dinov3", dtype="auto")This is the base variant of the PMT-DINOv3 model trained for panoptic segmentation on COCO at 640x640 resolution.
| Property | Value |
|---|---|
| Backbone | DINOv3 ViT-B/16 |
| Input Resolution | 640x640 |
| Task | Panoptic Segmentation |
| Dataset | COCO |
@inproceedings{cavagnero2026pmt,
author = {Cavagnero, Niccolò and Norouzi, Narges and Dubbelman, Gijs and de Geus, Daan},
title = {PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
year = {2026},
}
Unable to build the model tree, the base model loops to the model itself. Learn more.