Research

Paper

AI LLM March 17, 2026

IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time

Authors

Zhenghua Bao, Yi Shi

Abstract

Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach that shifts cross-document reasoning from online inference to offline indexing. IndexRAG identifies bridge entities shared across documents and generates bridging facts as independently retrievable units, requiring no additional training or fine-tuning. Experiments on three widely-used multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue) show that IndexRAG improves F1 over Naive RAG by 4.6 points on average, while requiring only single-pass retrieval and a single LLM call at inference time. When combined with IRCoT, IndexRAG outperforms all graph-based baselines on average, including HippoRAG and FastGraphRAG, while relying solely on flat retrieval. Our code will be released upon acceptance.

Metadata

arXiv ID: 2603.16415
Provider: ARXIV
Primary Category: cs.CL
Published: 2026-03-17
Fetched: 2026-03-18 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.16415v1</id>\n    <title>IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time</title>\n    <updated>2026-03-17T11:51:31Z</updated>\n    <link href='https://arxiv.org/abs/2603.16415v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.16415v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach that shifts cross-document reasoning from online inference to offline indexing. IndexRAG identifies bridge entities shared across documents and generates bridging facts as independently retrievable units, requiring no additional training or fine-tuning. Experiments on three widely-used multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue) show that IndexRAG improves F1 over Naive RAG by 4.6 points on average, while requiring only single-pass retrieval and a single LLM call at inference time. When combined with IRCoT, IndexRAG outperforms all graph-based baselines on average, including HippoRAG and FastGraphRAG, while relying solely on flat retrieval. Our code will be released upon acceptance.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <published>2026-03-17T11:51:31Z</published>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Zhenghua Bao</name>\n    </author>\n    <author>\n      <name>Yi Shi</name>\n    </author>\n  </entry>"
}