Research

Paper

AI LLM March 10, 2026

MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations

Authors

Abhishikth Mallampalli, Sridhara Dasu

Abstract

Large-scale scientific collaborations, such as the Compact Muon Solenoid (CMS) at CERN, produce a vast and ever-growing corpus of internal documentation. Navigating this complex information landscape presents a significant challenge for both new and experienced researchers, hindering knowledge sharing and slowing down the pace of scientific discovery. To address this, we present a prototype of MITRA, a Retrieval-Augmented Generation (RAG) based system, designed to answer specific, context-aware questions about physics analyses. MITRA employs a novel, automated pipeline using Selenium for document retrieval from internal databases and Optical Character Recognition (OCR) with layout parsing for high-fidelity text extraction. Crucially, MITRA's entire framework, from the embedding model to the Large Language Model (LLM), is hosted on-premise, ensuring that sensitive collaboration data remains private. We introduce a two-tiered vector database architecture that first identifies the relevant analysis from abstracts before focusing on the full documentation, resolving potential ambiguities between different analyses. We demonstrate the prototype's superior retrieval performance against a standard keyword-based baseline on realistic queries and discuss future work towards developing a comprehensive research agent for large experimental collaborations.

Metadata

arXiv ID: 2603.09800
Provider: ARXIV
Primary Category: cs.IR
Published: 2026-03-10
Fetched: 2026-03-11 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.09800v1</id>\n    <title>MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations</title>\n    <updated>2026-03-10T15:28:35Z</updated>\n    <link href='https://arxiv.org/abs/2603.09800v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.09800v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large-scale scientific collaborations, such as the Compact Muon Solenoid (CMS) at CERN, produce a vast and ever-growing corpus of internal documentation. Navigating this complex information landscape presents a significant challenge for both new and experienced researchers, hindering knowledge sharing and slowing down the pace of scientific discovery. To address this, we present a prototype of MITRA, a Retrieval-Augmented Generation (RAG) based system, designed to answer specific, context-aware questions about physics analyses. MITRA employs a novel, automated pipeline using Selenium for document retrieval from internal databases and Optical Character Recognition (OCR) with layout parsing for high-fidelity text extraction. Crucially, MITRA's entire framework, from the embedding model to the Large Language Model (LLM), is hosted on-premise, ensuring that sensitive collaboration data remains private. We introduce a two-tiered vector database architecture that first identifies the relevant analysis from abstracts before focusing on the full documentation, resolving potential ambiguities between different analyses. We demonstrate the prototype's superior retrieval performance against a standard keyword-based baseline on realistic queries and discuss future work towards developing a comprehensive research agent for large experimental collaborations.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-03-10T15:28:35Z</published>\n    <arxiv:comment>Accepted at NeurIPS 2025 Machine Learning for the Physical Sciences workshop and Lepton Photon conference 2025 (Computing AI/ML track)</arxiv:comment>\n    <arxiv:primary_category term='cs.IR'/>\n    <author>\n      <name>Abhishikth Mallampalli</name>\n    </author>\n    <author>\n      <name>Sridhara Dasu</name>\n    </author>\n  </entry>"
}