Paper
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
Authors
Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee
Abstract
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over text descriptions from an audio captioner; and (3) audio-grounded evaluation, where each LLM is fine-tuned into a Large Audio Language Model (LALM) with an audio encoder. Our findings reveal that auditory knowledge varies substantially across families, and text-only results are strongly correlated with audio performance. Our work provides empirical grounding for a comprehensive understanding of LLMs in audio research.
Metadata
Related papers
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Biplab Pal, Santanu Bhattacharya • 2026-03-25
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.19195v1</id>\n <title>How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation</title>\n <updated>2026-03-19T17:50:07Z</updated>\n <link href='https://arxiv.org/abs/2603.19195v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.19195v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over text descriptions from an audio captioner; and (3) audio-grounded evaluation, where each LLM is fine-tuned into a Large Audio Language Model (LALM) with an audio encoder. Our findings reveal that auditory knowledge varies substantially across families, and text-only results are strongly correlated with audio performance. Our work provides empirical grounding for a comprehensive understanding of LLMs in audio research.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='eess.AS'/>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.SD'/>\n <published>2026-03-19T17:50:07Z</published>\n <arxiv:comment>Project website: https://kehanlu.github.io/AKB</arxiv:comment>\n <arxiv:primary_category term='eess.AS'/>\n <author>\n <name>Ke-Han Lu</name>\n </author>\n <author>\n <name>Szu-Wei Fu</name>\n </author>\n <author>\n <name>Chao-Han Huck Yang</name>\n </author>\n <author>\n <name>Zhehuai Chen</name>\n </author>\n <author>\n <name>Sung-Feng Huang</name>\n </author>\n <author>\n <name>Chih-Kai Yang</name>\n </author>\n <author>\n <name>Yi-Cheng Lin</name>\n </author>\n <author>\n <name>Chi-Yuan Hsiao</name>\n </author>\n <author>\n <name>Wenze Ren</name>\n </author>\n <author>\n <name>En-Pei Hu</name>\n </author>\n <author>\n <name>Yu-Han Huang</name>\n </author>\n <author>\n <name>An-Yu Cheng</name>\n </author>\n <author>\n <name>Cheng-Han Chiang</name>\n </author>\n <author>\n <name>Yu Tsao</name>\n </author>\n <author>\n <name>Yu-Chiang Frank Wang</name>\n </author>\n <author>\n <name>Hung-yi Lee</name>\n </author>\n </entry>"
}