Paper
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
Authors
Alex Robertson, Huizhi Liang, Mahbub Gani, Rohit Kumar, Srijith Rajamohan
Abstract
Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and narrow questions, leading to limited coverage and misleading evaluations. We present KGHaluBench, a Knowledge Graph-based hallucination benchmark that assesses LLMs across the breadth and depth of their knowledge, providing a fairer and more comprehensive insight into LLM truthfulness. Our framework utilises the KG to dynamically construct challenging, multifaceted questions, whose difficulty is then statistically estimated to address popularity bias. Our automated verification pipeline detects abstentions and verifies the LLM's response at both conceptual and correctness levels to identify different types of hallucinations. We evaluate 25 frontier models, using novel accuracy and hallucination metrics. The results provide a more interpretable insight into the knowledge factors that cause hallucinations across different model sizes. KGHaluBench is publicly available to support future developments in hallucination mitigation.
Metadata
Related papers
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Biplab Pal, Santanu Bhattacharya • 2026-03-25
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2602.19643v1</id>\n <title>KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge</title>\n <updated>2026-02-23T09:41:46Z</updated>\n <link href='https://arxiv.org/abs/2602.19643v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2602.19643v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and narrow questions, leading to limited coverage and misleading evaluations. We present KGHaluBench, a Knowledge Graph-based hallucination benchmark that assesses LLMs across the breadth and depth of their knowledge, providing a fairer and more comprehensive insight into LLM truthfulness. Our framework utilises the KG to dynamically construct challenging, multifaceted questions, whose difficulty is then statistically estimated to address popularity bias. Our automated verification pipeline detects abstentions and verifies the LLM's response at both conceptual and correctness levels to identify different types of hallucinations. We evaluate 25 frontier models, using novel accuracy and hallucination metrics. The results provide a more interpretable insight into the knowledge factors that cause hallucinations across different model sizes. KGHaluBench is publicly available to support future developments in hallucination mitigation.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n <published>2026-02-23T09:41:46Z</published>\n <arxiv:comment>EACL 2026 Findings</arxiv:comment>\n <arxiv:primary_category term='cs.CL'/>\n <author>\n <name>Alex Robertson</name>\n </author>\n <author>\n <name>Huizhi Liang</name>\n </author>\n <author>\n <name>Mahbub Gani</name>\n </author>\n <author>\n <name>Rohit Kumar</name>\n </author>\n <author>\n <name>Srijith Rajamohan</name>\n </author>\n </entry>"
}