Research

Paper

TESTING March 03, 2026

Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language

Authors

Michelle Stegeman, Lena Philipp, Fennie van der Graaf, Marina D'Amato, Clément Grisi, Luc Builtjes, Joeran S. Bosma, Judith Lefkes, Rianne A. Weber, James A. Meakin, Thomas Koopman, Anne Mickan, Mathias Prokop, Ewoud J. Smit, Geert Litjens, Jeroen van der Laak, Bram van Ginneken, Maarten de Rooij, Henkjan Huisman, Colin Jacobs, Francesco Ciompi, Alessa Hering

Abstract

Medical foundation models show promise to learn broadly generalizable features from large, diverse datasets. This could be the base for reliable cross-modality generalization and rapid adaptation to new, task-specific goals, with only a few task-specific examples. Yet, evidence for this is limited by the lack of public, standardized, and reproducible evaluation frameworks, as existing public benchmarks are often fragmented across task-, organ-, or modality-specific settings, limiting assessment of cross-task generalization. We introduce UNICORN, a public benchmark designed to systematically evaluate medical foundation models under a unified protocol. To isolate representation quality, we built the benchmark on a novel two-step framework that decouples model inference from task-specific evaluation based on standardized few-shot adaptation. As a central design choice, we constructed indirectly accessible sequestered test sets derived from clinically relevant cohorts, along with standardized evaluation code and a submission interface on an open benchmarking platform. Performance is aggregated into a single UNICORN Score, a new metric that we introduce to support direct comparison of foundation models across diverse medical domains, modalities, and task types. The UNICORN test dataset includes data from more than 2,400 patients, including over 3,700 vision cases and over 2,400 clinical reports collected from 17 institutions across eight countries. The benchmark spans eight anatomical regions and four imaging modalities. Both task-specific and aggregated leaderboards enable accessible, standardized, and reproducible evaluation. By standardizing multi-task, multi-modality assessment, UNICORN establishes a foundation for reproducible benchmarking of medical foundation models. Data, baseline methods, and the evaluation platform are publicly available via unicorn.grand-challenge.org.

Metadata

arXiv ID: 2603.02790

Provider: ARXIV

Primary Category: cs.CV

Published: 2026-03-03

Fetched: 2026-03-04 03:41

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.02790v1</id>\n    <title>Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language</title>\n    <updated>2026-03-03T09:27:06Z</updated>\n    <link href='https://arxiv.org/abs/2603.02790v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.02790v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Medical foundation models show promise to learn broadly generalizable features from large, diverse datasets. This could be the base for reliable cross-modality generalization and rapid adaptation to new, task-specific goals, with only a few task-specific examples. Yet, evidence for this is limited by the lack of public, standardized, and reproducible evaluation frameworks, as existing public benchmarks are often fragmented across task-, organ-, or modality-specific settings, limiting assessment of cross-task generalization. We introduce UNICORN, a public benchmark designed to systematically evaluate medical foundation models under a unified protocol. To isolate representation quality, we built the benchmark on a novel two-step framework that decouples model inference from task-specific evaluation based on standardized few-shot adaptation. As a central design choice, we constructed indirectly accessible sequestered test sets derived from clinically relevant cohorts, along with standardized evaluation code and a submission interface on an open benchmarking platform. Performance is aggregated into a single UNICORN Score, a new metric that we introduce to support direct comparison of foundation models across diverse medical domains, modalities, and task types. The UNICORN test dataset includes data from more than 2,400 patients, including over 3,700 vision cases and over 2,400 clinical reports collected from 17 institutions across eight countries. The benchmark spans eight anatomical regions and four imaging modalities. Both task-specific and aggregated leaderboards enable accessible, standardized, and reproducible evaluation. By standardizing multi-task, multi-modality assessment, UNICORN establishes a foundation for reproducible benchmarking of medical foundation models. Data, baseline methods, and the evaluation platform are publicly available via unicorn.grand-challenge.org.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-03-03T09:27:06Z</published>\n    <arxiv:comment>This paper describes the dataset and design of the UNICORN challenge and provides the link to Grand Challenge</arxiv:comment>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Michelle Stegeman</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Lena Philipp</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Fennie van der Graaf</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Marina D'Amato</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Clément Grisi</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Luc Builtjes</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Joeran S. Bosma</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Judith Lefkes</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Rianne A. Weber</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>James A. Meakin</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Thomas Koopman</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Anne Mickan</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Mathias Prokop</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Ewoud J. Smit</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Geert Litjens</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Jeroen van der Laak</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Bram van Ginneken</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Maarten de Rooij</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Henkjan Huisman</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Colin Jacobs</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Francesco Ciompi</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Alessa Hering</name>\n      <arxiv:affiliation>and on behalf of the UNICORN consortium</arxiv:affiliation>\n    </author>\n  </entry>"
}