Paper
A Comprehensive Benchmark of Histopathology Foundation Models for Kidney Histopathology
Authors
Harishwar Reddy Kasireddy, Patricio S. La Rosa, Akshita Gupta, Anindya S. Paul, Jamie L. Fermin, William L. Clapp, Meryl A. Waldman, Tarek M. El-Ashkar, Sanjay Jain, Luis Rodrigues, Kuang Yu Jen, Avi Z. Rosenberg, Michael T. Eadon, Jeffrey B. Hodgin, Pinaki Sarder
Abstract
Histopathology foundation models (HFMs), pretrained on large-scale cancer datasets, have advanced computational pathology. However, their applicability to non-cancerous chronic kidney disease remains underexplored, despite coexistence of renal pathology with malignancies such as renal cell and urothelial carcinoma. We systematically evaluate 11 publicly available HFMs across 11 kidney-specific downstream tasks spanning multiple stains (PAS, H&E, PASM, and IHC), spatial scales (tile and slide-level), task types (classification, regression, and copy detection), and clinical objectives, including detection, diagnosis, and prognosis. Tile-level performance is assessed using repeated stratified group cross-validation, while slide-level tasks are evaluated using repeated nested stratified cross-validation. Statistical significance is examined using Friedman test followed by pairwise Wilcoxon signed-rank testing with Holm-Bonferroni correction and compact letter display visualization. To promote reproducibility, we release an open-source Python package, kidney-hfm-eval, available at https://pypi.org/project/kidney-hfm-eval/ , that reproduces the evaluation pipelines. Results show moderate to strong performance on tasks driven by coarse meso-scale renal morphology, including diagnostic classification and detection of prominent structural alterations. In contrast, performance consistently declines for tasks requiring fine-grained microstructural discrimination, complex biological phenotypes, or slide-level prognostic inference, largely independent of stain type. Overall, current HFMs appear to encode predominantly static meso-scale representations and may have limited capacity to capture subtle renal pathology or prognosis-related signals. Our results highlight the need for kidney-specific, multi-stain, and multimodal foundation models to support clinically reliable decision-making in nephrology.
Metadata
Related papers
Fractal universe and quantum gravity made simple
Fabio Briscese, Gianluca Calcagni • 2026-03-25
POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan
Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25
Orientation Reconstruction of Proteins using Coulomb Explosions
Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.15967v1</id>\n <title>A Comprehensive Benchmark of Histopathology Foundation Models for Kidney Histopathology</title>\n <updated>2026-03-16T22:37:43Z</updated>\n <link href='https://arxiv.org/abs/2603.15967v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.15967v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Histopathology foundation models (HFMs), pretrained on large-scale cancer datasets, have advanced computational pathology. However, their applicability to non-cancerous chronic kidney disease remains underexplored, despite coexistence of renal pathology with malignancies such as renal cell and urothelial carcinoma. We systematically evaluate 11 publicly available HFMs across 11 kidney-specific downstream tasks spanning multiple stains (PAS, H&E, PASM, and IHC), spatial scales (tile and slide-level), task types (classification, regression, and copy detection), and clinical objectives, including detection, diagnosis, and prognosis. Tile-level performance is assessed using repeated stratified group cross-validation, while slide-level tasks are evaluated using repeated nested stratified cross-validation. Statistical significance is examined using Friedman test followed by pairwise Wilcoxon signed-rank testing with Holm-Bonferroni correction and compact letter display visualization. To promote reproducibility, we release an open-source Python package, kidney-hfm-eval, available at https://pypi.org/project/kidney-hfm-eval/ , that reproduces the evaluation pipelines. Results show moderate to strong performance on tasks driven by coarse meso-scale renal morphology, including diagnostic classification and detection of prominent structural alterations. In contrast, performance consistently declines for tasks requiring fine-grained microstructural discrimination, complex biological phenotypes, or slide-level prognostic inference, largely independent of stain type. Overall, current HFMs appear to encode predominantly static meso-scale representations and may have limited capacity to capture subtle renal pathology or prognosis-related signals. Our results highlight the need for kidney-specific, multi-stain, and multimodal foundation models to support clinically reliable decision-making in nephrology.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n <published>2026-03-16T22:37:43Z</published>\n <arxiv:comment>31 Pages, 14 Tables, 12 figures, Co-correspondence to jhodgin@med.umich.edu and pinaki.sarder@ufl.edu</arxiv:comment>\n <arxiv:primary_category term='cs.CV'/>\n <author>\n <name>Harishwar Reddy Kasireddy</name>\n <arxiv:affiliation>University of Florida</arxiv:affiliation>\n </author>\n <author>\n <name>Patricio S. La Rosa</name>\n <arxiv:affiliation>University of Florida</arxiv:affiliation>\n <arxiv:affiliation>Bayer Company</arxiv:affiliation>\n </author>\n <author>\n <name>Akshita Gupta</name>\n <arxiv:affiliation>University of Florida</arxiv:affiliation>\n </author>\n <author>\n <name>Anindya S. Paul</name>\n <arxiv:affiliation>University of Florida</arxiv:affiliation>\n </author>\n <author>\n <name>Jamie L. Fermin</name>\n <arxiv:affiliation>University of Florida</arxiv:affiliation>\n </author>\n <author>\n <name>William L. Clapp</name>\n <arxiv:affiliation>University of Florida</arxiv:affiliation>\n </author>\n <author>\n <name>Meryl A. Waldman</name>\n <arxiv:affiliation>National Institutes of Health</arxiv:affiliation>\n </author>\n <author>\n <name>Tarek M. El-Ashkar</name>\n <arxiv:affiliation>Indiana University School of Medicine</arxiv:affiliation>\n </author>\n <author>\n <name>Sanjay Jain</name>\n <arxiv:affiliation>Washington University School of Medicine</arxiv:affiliation>\n </author>\n <author>\n <name>Luis Rodrigues</name>\n <arxiv:affiliation>Universidade de Coimbra</arxiv:affiliation>\n </author>\n <author>\n <name>Kuang Yu Jen</name>\n <arxiv:affiliation>University of California Davis</arxiv:affiliation>\n </author>\n <author>\n <name>Avi Z. Rosenberg</name>\n <arxiv:affiliation>Johns Hopkins University</arxiv:affiliation>\n </author>\n <author>\n <name>Michael T. Eadon</name>\n <arxiv:affiliation>Indiana University School of Medicine</arxiv:affiliation>\n </author>\n <author>\n <name>Jeffrey B. Hodgin</name>\n <arxiv:affiliation>University of Michigan</arxiv:affiliation>\n </author>\n <author>\n <name>Pinaki Sarder</name>\n <arxiv:affiliation>University of Florida</arxiv:affiliation>\n </author>\n </entry>"
}