Research

Paper

AI LLM March 02, 2026

VoiceAgengRAG: Solving the RAG Latency Bottleneck in Real-Time Voice Agents Using Dual-Agent Architectures

Authors

Jielin Qiu, Jianguo Zhang, Zixiang Chen, Liangwei Yang, Ming Zhu, Juntao Tan, Haolin Chen, Wenting Zhao, Rithesh Murthy, Roshan Ram, Akshara Prabhakar, Shelby Heinecke, Caiming, Xiong, Silvio Savarese, Huan Wang

Abstract

We present VoiceAgentRAG, an open-source dual-agent memory router that decouples retrieval from response generation. A background Slow Thinker agent continuously monitors the conversation stream, predicts likely follow-up topics using an LLM, and pre-fetches relevant document chunks into a FAISS-backed semantic cache. A foreground Fast Talker agent reads only from this sub-millisecond cache, bypassing the vector database entirely on cache hits.

Metadata

arXiv ID: 2603.02206
Provider: ARXIV
Primary Category: cs.SD
Published: 2026-03-02
Fetched: 2026-03-03 04:34

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.02206v1</id>\n    <title>VoiceAgengRAG: Solving the RAG Latency Bottleneck in Real-Time Voice Agents Using Dual-Agent Architectures</title>\n    <updated>2026-03-02T18:58:54Z</updated>\n    <link href='https://arxiv.org/abs/2603.02206v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.02206v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>We present VoiceAgentRAG, an open-source dual-agent memory router that decouples retrieval from response generation. A background Slow Thinker agent continuously monitors the conversation stream, predicts likely follow-up topics using an LLM, and pre-fetches relevant document chunks into a FAISS-backed semantic cache. A foreground Fast Talker agent reads only from this sub-millisecond cache, bypassing the vector database entirely on cache hits.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SD'/>\n    <published>2026-03-02T18:58:54Z</published>\n    <arxiv:primary_category term='cs.SD'/>\n    <author>\n      <name>Jielin Qiu</name>\n    </author>\n    <author>\n      <name>Jianguo Zhang</name>\n    </author>\n    <author>\n      <name>Zixiang Chen</name>\n    </author>\n    <author>\n      <name>Liangwei Yang</name>\n    </author>\n    <author>\n      <name>Ming Zhu</name>\n    </author>\n    <author>\n      <name>Juntao Tan</name>\n    </author>\n    <author>\n      <name>Haolin Chen</name>\n    </author>\n    <author>\n      <name>Wenting Zhao</name>\n    </author>\n    <author>\n      <name>Rithesh Murthy</name>\n    </author>\n    <author>\n      <name>Roshan Ram</name>\n    </author>\n    <author>\n      <name>Akshara Prabhakar</name>\n    </author>\n    <author>\n      <name>Shelby Heinecke</name>\n    </author>\n    <author>\n      <name> Caiming</name>\n    </author>\n    <author>\n      <name> Xiong</name>\n    </author>\n    <author>\n      <name>Silvio Savarese</name>\n    </author>\n    <author>\n      <name>Huan Wang</name>\n    </author>\n  </entry>"
}