PMT-DINOv3 (Base, 640px) for COCO Panoptic Segmentation

Overview

This is the base variant of the PMT-DINOv3 model trained for panoptic segmentation on COCO at 640x640 resolution.

Model Details

Property	Value
Backbone	DINOv3 ViT-B/16
Input Resolution	640x640
Task	Panoptic Segmentation
Dataset	COCO

Citation

@inproceedings{cavagnero2026pmt,
    author    = {Cavagnero, Niccolò and Norouzi, Narges and Dubbelman, Gijs and de Geus, Daan},
    title     = {PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
    year      = {2026},
}

Acknowledgements

Original implementation: tue-mps/pmt
Paper: arXiv:2503.19108

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

30.4M params

Tensor type

F32

Model tree for tue-mps/coco_panoptic_pmt_base_640_dinov3

Unable to build the model tree, the base model loops to the model itself. Learn more.

Papers for tue-mps/coco_panoptic_pmt_base_640_dinov3

PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders

Paper • 2603.25398 • Published Mar 26 • 3

Your ViT is Secretly an Image Segmentation Model

Paper • 2503.19108 • Published Mar 24, 2025 • 25