Research

Paper

AI LLM March 24, 2026

From Questions to Trust Reports: A LLM-IR Framework for the TREC 2025 DRAGUN Track

Authors

Ignacy Alwasiak, Kene Nnolim, Jaclyn Thi, Samy Ateia, Markus Bink, Gregor Donabauer, David Elsweiler, Udo Kruschwitz

Abstract

The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both Task 1 (critical question generation) and Task 2 (retrieval-augmented trustworthiness reporting). Our approach combines LLM-based question generation with semantic filtering, diversity enforcement using clustering, and several query expansion strategies (including reasoning-based Chain-of-Thought expansion) to retrieve relevant evidence from the MS MARCO V2.1 segmented corpus. Retrieved documents are re-ranked using a monoT5 model and filtered using an LLM relevance judge together with a domain-level trustworthiness dataset. For Task 2, selected evidence is synthesized by an LLM into concise trustworthiness reports with citations. Results from the official evaluation indicate that Chain-of-Thought query expansion and re-ranking substantially improve both relevance and domain trust compared to baseline retrieval, while question-generation performance shows moderate quality with room for improvement. We conclude by outlining key challenges encountered and suggesting directions for enhancing robustness and trustworthiness assessment in future iterations of the system.

Metadata

arXiv ID: 2603.23125

Provider: ARXIV

Primary Category: cs.IR

Published: 2026-03-24

Fetched: 2026-03-25 06:02

Related papers

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya • 2026-03-25

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.23125v1</id>\n    <title>From Questions to Trust Reports: A LLM-IR Framework for the TREC 2025 DRAGUN Track</title>\n    <updated>2026-03-24T12:22:27Z</updated>\n    <link href='https://arxiv.org/abs/2603.23125v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.23125v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both Task 1 (critical question generation) and Task 2 (retrieval-augmented trustworthiness reporting). Our approach combines LLM-based question generation with semantic filtering, diversity enforcement using clustering, and several query expansion strategies (including reasoning-based Chain-of-Thought expansion) to retrieve relevant evidence from the MS MARCO V2.1 segmented corpus. Retrieved documents are re-ranked using a monoT5 model and filtered using an LLM relevance judge together with a domain-level trustworthiness dataset. For Task 2, selected evidence is synthesized by an LLM into concise trustworthiness reports with citations. Results from the official evaluation indicate that Chain-of-Thought query expansion and re-ranking substantially improve both relevance and domain trust compared to baseline retrieval, while question-generation performance shows moderate quality with room for improvement. We conclude by outlining key challenges encountered and suggesting directions for enhancing robustness and trustworthiness assessment in future iterations of the system.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <published>2026-03-24T12:22:27Z</published>\n    <arxiv:comment>TREC 2025 Proceedings</arxiv:comment>\n    <arxiv:primary_category term='cs.IR'/>\n    <author>\n      <name>Ignacy Alwasiak</name>\n    </author>\n    <author>\n      <name>Kene Nnolim</name>\n    </author>\n    <author>\n      <name>Jaclyn Thi</name>\n    </author>\n    <author>\n      <name>Samy Ateia</name>\n    </author>\n    <author>\n      <name>Markus Bink</name>\n    </author>\n    <author>\n      <name>Gregor Donabauer</name>\n    </author>\n    <author>\n      <name>David Elsweiler</name>\n    </author>\n    <author>\n      <name>Udo Kruschwitz</name>\n    </author>\n  </entry>"
}