Research

Paper

AI LLM March 19, 2026

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Authors

Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee

Abstract

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over text descriptions from an audio captioner; and (3) audio-grounded evaluation, where each LLM is fine-tuned into a Large Audio Language Model (LALM) with an audio encoder. Our findings reveal that auditory knowledge varies substantially across families, and text-only results are strongly correlated with audio performance. Our work provides empirical grounding for a comprehensive understanding of LLMs in audio research.

Metadata

arXiv ID: 2603.19195
Provider: ARXIV
Primary Category: eess.AS
Published: 2026-03-19
Fetched: 2026-03-20 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.19195v1</id>\n    <title>How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation</title>\n    <updated>2026-03-19T17:50:07Z</updated>\n    <link href='https://arxiv.org/abs/2603.19195v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.19195v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark testing the breadth and depth of auditory knowledge; (2) cascade evaluation, where LLMs reason over text descriptions from an audio captioner; and (3) audio-grounded evaluation, where each LLM is fine-tuned into a Large Audio Language Model (LALM) with an audio encoder. Our findings reveal that auditory knowledge varies substantially across families, and text-only results are strongly correlated with audio performance. Our work provides empirical grounding for a comprehensive understanding of LLMs in audio research.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='eess.AS'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SD'/>\n    <published>2026-03-19T17:50:07Z</published>\n    <arxiv:comment>Project website: https://kehanlu.github.io/AKB</arxiv:comment>\n    <arxiv:primary_category term='eess.AS'/>\n    <author>\n      <name>Ke-Han Lu</name>\n    </author>\n    <author>\n      <name>Szu-Wei Fu</name>\n    </author>\n    <author>\n      <name>Chao-Han Huck Yang</name>\n    </author>\n    <author>\n      <name>Zhehuai Chen</name>\n    </author>\n    <author>\n      <name>Sung-Feng Huang</name>\n    </author>\n    <author>\n      <name>Chih-Kai Yang</name>\n    </author>\n    <author>\n      <name>Yi-Cheng Lin</name>\n    </author>\n    <author>\n      <name>Chi-Yuan Hsiao</name>\n    </author>\n    <author>\n      <name>Wenze Ren</name>\n    </author>\n    <author>\n      <name>En-Pei Hu</name>\n    </author>\n    <author>\n      <name>Yu-Han Huang</name>\n    </author>\n    <author>\n      <name>An-Yu Cheng</name>\n    </author>\n    <author>\n      <name>Cheng-Han Chiang</name>\n    </author>\n    <author>\n      <name>Yu Tsao</name>\n    </author>\n    <author>\n      <name>Yu-Chiang Frank Wang</name>\n    </author>\n    <author>\n      <name>Hung-yi Lee</name>\n    </author>\n  </entry>"
}