Research

Paper

AI LLM March 23, 2026

Overview of TREC 2025 Biomedical Generative Retrieval (BioGen) Track

Authors

Deepak Gupta, Dina Demner-Fushman, William Hersh, Steven Bedrick, Kirk Roberts

Abstract

Recent advances in large language models (LLMs) have made significant progress across multiple biomedical tasks, including biomedical question answering, lay-language summarization of the biomedical literature, and clinical note summarization. These models have demonstrated strong capabilities in processing and synthesizing complex biomedical information and in generating fluent, human-like responses. Despite these advancements, hallucinations or confabulations remain key challenges when using LLMs in biomedical and other high-stakes domains. Inaccuracies may be particularly harmful in high-risk situations, such as medical question answering, making clinical decisions, or appraising biomedical research. Studies on the evaluation of the LLMs' abilities to ground generated statements in verifiable sources have shown that models perform significantly

Metadata

arXiv ID: 2603.21582

Provider: ARXIV

Primary Category: cs.IR

Published: 2026-03-23

Fetched: 2026-03-24 06:02

Related papers

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya • 2026-03-25

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.21582v1</id>\n    <title>Overview of TREC 2025 Biomedical Generative Retrieval (BioGen) Track</title>\n    <updated>2026-03-23T05:10:16Z</updated>\n    <link href='https://arxiv.org/abs/2603.21582v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.21582v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Recent advances in large language models (LLMs) have made significant progress across multiple biomedical tasks, including biomedical question answering, lay-language summarization of the biomedical literature, and clinical note summarization. These models have demonstrated strong capabilities in processing and synthesizing complex biomedical information and in generating fluent, human-like responses. Despite these advancements, hallucinations or confabulations remain key challenges when using LLMs in biomedical and other high-stakes domains. Inaccuracies may be particularly harmful in high-risk situations, such as medical question answering, making clinical decisions, or appraising biomedical research. Studies on the evaluation of the LLMs' abilities to ground generated statements in verifiable sources have shown that models perform significantly</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <published>2026-03-23T05:10:16Z</published>\n    <arxiv:primary_category term='cs.IR'/>\n    <author>\n      <name>Deepak Gupta</name>\n    </author>\n    <author>\n      <name>Dina Demner-Fushman</name>\n    </author>\n    <author>\n      <name>William Hersh</name>\n    </author>\n    <author>\n      <name>Steven Bedrick</name>\n    </author>\n    <author>\n      <name>Kirk Roberts</name>\n    </author>\n  </entry>"
}