Research

Paper

AI LLM February 25, 2026

RAC: Relation-Aware Cache Replacement for Large Language Models

Authors

Yuchong Wu, Zihuan Xu, Wangze Ni, Peng Cheng, Lei Chen, Xuemin Lin, Heng Tao Shen, Kui Ren

Abstract

The scaling of Large Language Model (LLM) services faces significant cost and latency challenges, making effective caching under tight capacity crucial. Existing cache replacement policies, from heuristics to learning-based methods, predominantly rely on limited-window statistics such as recency and frequency. We show these signals are not robust for real-world LLM workloads, which exhibit long reuse distances and sparse local recurrence. To address these limitations, we propose Relation-Aware Cache (RAC), an online eviction strategy that leverages semantic relations among requests to guide eviction decisions. RAC synthesizes two relation-aware signals: (1) Topical Prevalence, which aggregates access evidence at the topic level to capture long-horizon reuse; and (2) Structural Importance, which leverages local intra-topic dependency structure to discriminate entries by their future reuse value. Extensive evaluations show that RAC maintains high effectiveness across diverse workloads, consistently surpassing state-of-the-art baselines by 20%--30% in cache hit ratio.

Metadata

arXiv ID: 2602.21547

Provider: ARXIV

Primary Category: cs.DB

Published: 2026-02-25

Fetched: 2026-02-26 05:00

Related papers

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya • 2026-03-25

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.21547v1</id>\n    <title>RAC: Relation-Aware Cache Replacement for Large Language Models</title>\n    <updated>2026-02-25T04:10:27Z</updated>\n    <link href='https://arxiv.org/abs/2602.21547v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.21547v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The scaling of Large Language Model (LLM) services faces significant cost and latency challenges, making effective caching under tight capacity crucial. Existing cache replacement policies, from heuristics to learning-based methods, predominantly rely on limited-window statistics such as recency and frequency. We show these signals are not robust for real-world LLM workloads, which exhibit long reuse distances and sparse local recurrence.\n  To address these limitations, we propose Relation-Aware Cache (RAC), an online eviction strategy that leverages semantic relations among requests to guide eviction decisions. RAC synthesizes two relation-aware signals: (1) Topical Prevalence, which aggregates access evidence at the topic level to capture long-horizon reuse; and (2) Structural Importance, which leverages local intra-topic dependency structure to discriminate entries by their future reuse value. Extensive evaluations show that RAC maintains high effectiveness across diverse workloads, consistently surpassing state-of-the-art baselines by 20%--30% in cache hit ratio.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.DB'/>\n    <published>2026-02-25T04:10:27Z</published>\n    <arxiv:primary_category term='cs.DB'/>\n    <author>\n      <name>Yuchong Wu</name>\n    </author>\n    <author>\n      <name>Zihuan Xu</name>\n    </author>\n    <author>\n      <name>Wangze Ni</name>\n    </author>\n    <author>\n      <name>Peng Cheng</name>\n    </author>\n    <author>\n      <name>Lei Chen</name>\n    </author>\n    <author>\n      <name>Xuemin Lin</name>\n    </author>\n    <author>\n      <name>Heng Tao Shen</name>\n    </author>\n    <author>\n      <name>Kui Ren</name>\n    </author>\n  </entry>"
}