Paper
LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain
Authors
Antonio De Santis, Marco Balduini, Matteo Belcao, Andrea Proia, Marco Brambilla, Emanuele Della Valle
Abstract
Large manufacturing companies face challenges in information retrieval due to data silos maintained by different departments, leading to inconsistencies and misalignment across databases. This paper presents an experience in integrating and retrieving qualification data for electronic components used in satellite board design. Due to data silos, designers cannot immediately determine the qualification status of individual components. However, this process is critical during the planning phase, when assembly drawings are issued before production, to optimize new qualifications and avoid redundant efforts. To address this, we propose a pipeline that uses Virtual Knowledge Graphs for a unified view over heterogeneous data sources and LLMs to enhance retrieval and reduce manual effort in data cleansing. The retrieval of qualifications is then performed through an Ontology-based Data Access approach for structured queries and a vector search mechanism for retrieving qualifications based on similar textual properties. We perform a comparative cost-benefit analysis, demonstrating that the proposed pipeline also outperforms approaches relying solely on LLMs, such as Retrieval-Augmented Generation (RAG), in terms of long-term efficiency.
Metadata
Related papers
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Biplab Pal, Santanu Bhattacharya • 2026-03-25
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.20094v1</id>\n <title>LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain</title>\n <updated>2026-03-20T16:22:06Z</updated>\n <link href='https://arxiv.org/abs/2603.20094v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.20094v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Large manufacturing companies face challenges in information retrieval due to data silos maintained by different departments, leading to inconsistencies and misalignment across databases. This paper presents an experience in integrating and retrieving qualification data for electronic components used in satellite board design. Due to data silos, designers cannot immediately determine the qualification status of individual components. However, this process is critical during the planning phase, when assembly drawings are issued before production, to optimize new qualifications and avoid redundant efforts. To address this, we propose a pipeline that uses Virtual Knowledge Graphs for a unified view over heterogeneous data sources and LLMs to enhance retrieval and reduce manual effort in data cleansing. The retrieval of qualifications is then performed through an Ontology-based Data Access approach for structured queries and a vector search mechanism for retrieving qualifications based on similar textual properties. We perform a comparative cost-benefit analysis, demonstrating that the proposed pipeline also outperforms approaches relying solely on LLMs, such as Retrieval-Augmented Generation (RAG), in terms of long-term efficiency.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.DB'/>\n <published>2026-03-20T16:22:06Z</published>\n <arxiv:comment>ESWC 2026</arxiv:comment>\n <arxiv:primary_category term='cs.IR'/>\n <author>\n <name>Antonio De Santis</name>\n </author>\n <author>\n <name>Marco Balduini</name>\n </author>\n <author>\n <name>Matteo Belcao</name>\n </author>\n <author>\n <name>Andrea Proia</name>\n </author>\n <author>\n <name>Marco Brambilla</name>\n </author>\n <author>\n <name>Emanuele Della Valle</name>\n </author>\n </entry>"
}