Research

Paper

TESTING March 10, 2026

Towards Instance Segmentation with Polygon Detection Transformers

Authors

Jiacheng Sun, Jiaqi Lin, Wenlong Hu, Haoyang Li, Xinghong Zhou, Chenghai Mao, Yan Peng, Xiaomao Li

Abstract

One of the bottlenecks for instance segmentation today lies in the conflicting requirements of high-resolution inputs and lightweight, real-time inference. To address this bottleneck, we present a Polygon Detection Transformer (Poly-DETR) to reformulate instance segmentation as sparse vertex regression via Polar Representation, thereby eliminating the reliance on dense pixel-wise mask prediction. Considering the box-to-polygon reference shift in Detection Transformers, we propose Polar Deformable Attention and Position-Aware Training Scheme to dynamically update supervision and focus attention on boundary cues. Compared with state-of-the-art polar-based methods, Poly-DETR achieves a 4.7 mAP improvement on MS COCO test-dev. Moreover, we construct a parallel mask-based counterpart to support a systematic comparison between polar and mask representations. Experimental results show that Poly-DETR is more lightweight in high-resolution scenarios, reducing memory consumption by almost half on Cityscapes dataset. Notably, on PanNuke (cell segmentation) and SpaceNet (building footprints) datasets, Poly-DETR surpasses its mask-based counterpart on all metrics, which validates its advantage on regular-shaped instances in domain-specific settings.

Metadata

arXiv ID: 2603.09245
Provider: ARXIV
Primary Category: cs.CV
Published: 2026-03-10
Fetched: 2026-03-11 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.09245v1</id>\n    <title>Towards Instance Segmentation with Polygon Detection Transformers</title>\n    <updated>2026-03-10T06:18:33Z</updated>\n    <link href='https://arxiv.org/abs/2603.09245v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.09245v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>One of the bottlenecks for instance segmentation today lies in the conflicting requirements of high-resolution inputs and lightweight, real-time inference. To address this bottleneck, we present a Polygon Detection Transformer (Poly-DETR) to reformulate instance segmentation as sparse vertex regression via Polar Representation, thereby eliminating the reliance on dense pixel-wise mask prediction. Considering the box-to-polygon reference shift in Detection Transformers, we propose Polar Deformable Attention and Position-Aware Training Scheme to dynamically update supervision and focus attention on boundary cues. Compared with state-of-the-art polar-based methods, Poly-DETR achieves a 4.7 mAP improvement on MS COCO test-dev. Moreover, we construct a parallel mask-based counterpart to support a systematic comparison between polar and mask representations. Experimental results show that Poly-DETR is more lightweight in high-resolution scenarios, reducing memory consumption by almost half on Cityscapes dataset. Notably, on PanNuke (cell segmentation) and SpaceNet (building footprints) datasets, Poly-DETR surpasses its mask-based counterpart on all metrics, which validates its advantage on regular-shaped instances in domain-specific settings.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-03-10T06:18:33Z</published>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Jiacheng Sun</name>\n    </author>\n    <author>\n      <name>Jiaqi Lin</name>\n    </author>\n    <author>\n      <name>Wenlong Hu</name>\n    </author>\n    <author>\n      <name>Haoyang Li</name>\n    </author>\n    <author>\n      <name>Xinghong Zhou</name>\n    </author>\n    <author>\n      <name>Chenghai Mao</name>\n    </author>\n    <author>\n      <name>Yan Peng</name>\n    </author>\n    <author>\n      <name>Xiaomao Li</name>\n    </author>\n  </entry>"
}