Paper
IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs
Authors
Chris Egersdoerfer, Arnav Sareen, Jean Luca Bez, Suren Byna, Dongkuan, Xu, Dong Dai
Abstract
As the complexity of the HPC storage stack rapidly grows, domain scientists face increasing challenges in effectively utilizing HPC storage systems to achieve their desired I/O performance. To identify and address I/O issues, scientists largely rely on I/O experts to analyze their I/O traces and provide insights into potential problems. However, with a limited number of I/O experts and the growing demand for data-intensive applications, inaccessibility has become a major bottleneck, hindering scientists from maximizing their productivity. Rapid advances in LLMs make it possible to build an automated tool that brings trustworthy I/O performance diagnosis to domain scientists. However, key challenges remain, such as the inability to handle long context windows, a lack of accurate domain knowledge about HPC I/O, and the generation of hallucinations during complex interactions.In this work, we propose IOAgent as a systematic effort to address these challenges. IOAgent integrates a module-based pre-processor, a RAG-based domain knowledge integrator, and a tree-based merger to accurately diagnose I/O issues from a given Darshan trace file. Similar to an I/O expert, IOAgent provides detailed justifications and references for its diagnoses and offers an interactive interface for scientists to ask targeted follow-up questions. To evaluate IOAgent, we collected a diverse set of labeled job traces and released the first open diagnosis test suite, TraceBench. Using this test suite, we conducted extensive evaluations, demonstrating that IOAgent matches or outperforms state-of-the-art I/O diagnosis tools with accurate and useful diagnosis results. We also show that IOAgent is not tied to specific LLMs, performing similarly well with both proprietary and open-source LLMs. We believe IOAgent has the potential to become a powerful tool for scientists navigating complex HPC I/O subsystems in the future.
Metadata
Related papers
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Biplab Pal, Santanu Bhattacharya • 2026-03-25
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2602.22017v1</id>\n <title>IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs</title>\n <updated>2026-02-25T15:30:55Z</updated>\n <link href='https://arxiv.org/abs/2602.22017v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2602.22017v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>As the complexity of the HPC storage stack rapidly grows, domain scientists face increasing challenges in effectively utilizing HPC storage systems to achieve their desired I/O performance. To identify and address I/O issues, scientists largely rely on I/O experts to analyze their I/O traces and provide insights into potential problems. However, with a limited number of I/O experts and the growing demand for data-intensive applications, inaccessibility has become a major bottleneck, hindering scientists from maximizing their productivity. Rapid advances in LLMs make it possible to build an automated tool that brings trustworthy I/O performance diagnosis to domain scientists. However, key challenges remain, such as the inability to handle long context windows, a lack of accurate domain knowledge about HPC I/O, and the generation of hallucinations during complex interactions.In this work, we propose IOAgent as a systematic effort to address these challenges. IOAgent integrates a module-based pre-processor, a RAG-based domain knowledge integrator, and a tree-based merger to accurately diagnose I/O issues from a given Darshan trace file. Similar to an I/O expert, IOAgent provides detailed justifications and references for its diagnoses and offers an interactive interface for scientists to ask targeted follow-up questions. To evaluate IOAgent, we collected a diverse set of labeled job traces and released the first open diagnosis test suite, TraceBench. Using this test suite, we conducted extensive evaluations, demonstrating that IOAgent matches or outperforms state-of-the-art I/O diagnosis tools with accurate and useful diagnosis results. We also show that IOAgent is not tied to specific LLMs, performing similarly well with both proprietary and open-source LLMs. We believe IOAgent has the potential to become a powerful tool for scientists navigating complex HPC I/O subsystems in the future.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.DC'/>\n <published>2026-02-25T15:30:55Z</published>\n <arxiv:comment>Published in the Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2025)</arxiv:comment>\n <arxiv:primary_category term='cs.DC'/>\n <author>\n <name>Chris Egersdoerfer</name>\n <arxiv:affiliation>DK</arxiv:affiliation>\n </author>\n <author>\n <name>Arnav Sareen</name>\n <arxiv:affiliation>DK</arxiv:affiliation>\n </author>\n <author>\n <name>Jean Luca Bez</name>\n <arxiv:affiliation>DK</arxiv:affiliation>\n </author>\n <author>\n <name>Suren Byna</name>\n <arxiv:affiliation>DK</arxiv:affiliation>\n </author>\n <author>\n <name> Dongkuan</name>\n <arxiv:affiliation>DK</arxiv:affiliation>\n </author>\n <author>\n <name> Xu</name>\n </author>\n <author>\n <name>Dong Dai</name>\n </author>\n <arxiv:doi>10.1109/IPDPS64566.2025.00036</arxiv:doi>\n <link href='https://doi.org/10.1109/IPDPS64566.2025.00036' rel='related' title='doi'/>\n </entry>"
}