Research

Paper

AI LLM March 18, 2026

Eye image segmentation using visual and concept prompts with Segment Anything Model 3 (SAM3)

Authors

Diederick C. Niehorster, Marcus Nyström

Abstract

Previous work has reported that vision foundation models show promising zero-shot performance in eye image segmentation. Here we examine whether the latest iteration of the Segment Anything Model, SAM3, offers better eye image segmentation performance than SAM2, and explore the performance of its new concept (text) prompting mode. Eye image segmentation performance was evaluated using diverse datasets encompassing both high-resolution high-quality videos from a lab environment and the TEyeD dataset consisting of challenging eye videos acquired in the wild. Results show that in most cases SAM3 with either visual or concept prompts did not perform better than SAM2, for both lab and in-the-wild datasets. Since SAM2 not only performed better but was also faster, we conclude that SAM2 remains the best option for eye image segmentation. We provide our adaptation of SAM3's codebase that allows processing videos of arbitrary duration.

Metadata

arXiv ID: 2603.17715
Provider: ARXIV
Primary Category: cs.CV
Published: 2026-03-18
Fetched: 2026-03-19 06:01

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.17715v1</id>\n    <title>Eye image segmentation using visual and concept prompts with Segment Anything Model 3 (SAM3)</title>\n    <updated>2026-03-18T13:33:46Z</updated>\n    <link href='https://arxiv.org/abs/2603.17715v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.17715v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Previous work has reported that vision foundation models show promising zero-shot performance in eye image segmentation. Here we examine whether the latest iteration of the Segment Anything Model, SAM3, offers better eye image segmentation performance than SAM2, and explore the performance of its new concept (text) prompting mode. Eye image segmentation performance was evaluated using diverse datasets encompassing both high-resolution high-quality videos from a lab environment and the TEyeD dataset consisting of challenging eye videos acquired in the wild. Results show that in most cases SAM3 with either visual or concept prompts did not perform better than SAM2, for both lab and in-the-wild datasets. Since SAM2 not only performed better but was also faster, we conclude that SAM2 remains the best option for eye image segmentation. We provide our adaptation of SAM3's codebase that allows processing videos of arbitrary duration.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-03-18T13:33:46Z</published>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Diederick C. Niehorster</name>\n    </author>\n    <author>\n      <name>Marcus Nyström</name>\n    </author>\n  </entry>"
}