Research

Paper

AI LLM March 05, 2026

InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context

Authors

Xin Teng, Canyu Zhang, Shaoyi Zheng, Danyang Zhuo, Tianyi Zhou, Shengjie Wang

Abstract

Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV) caches for individual documents and selectively recompute a small subset of tokens to restore global causal dependencies, but existing methods rely on heuristics or representation discrepancies without modeling whether selected tokens can effectively influence generation. We cast selective KV recomputation as an information flow problem and show that a simple attention-norm signal from the query reliably identifies tokens that are both semantically relevant and structurally positioned to propagate information, when computed under an inference-consistent RoPE geometry. We therefore reconstruct global positional assignments for retrieved chunks and introduce an information-flow-guided chunk reordering strategy. Experiments on LLM and VLM benchmarks demonstrate consistent gains over prior methods under comparable efficiency budgets.

Metadata

arXiv ID: 2603.05353

Provider: ARXIV

Primary Category: cs.LG

Published: 2026-03-05

Fetched: 2026-03-06 14:20

Related papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.05353v1</id>\n    <title>InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context</title>\n    <updated>2026-03-05T16:33:20Z</updated>\n    <link href='https://arxiv.org/abs/2603.05353v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.05353v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV) caches for individual documents and selectively recompute a small subset of tokens to restore global causal dependencies, but existing methods rely on heuristics or representation discrepancies without modeling whether selected tokens can effectively influence generation. We cast selective KV recomputation as an information flow problem and show that a simple attention-norm signal from the query reliably identifies tokens that are both semantically relevant and structurally positioned to propagate information, when computed under an inference-consistent RoPE geometry. We therefore reconstruct global positional assignments for retrieved chunks and introduce an information-flow-guided chunk reordering strategy. Experiments on LLM and VLM benchmarks demonstrate consistent gains over prior methods under comparable efficiency budgets.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <published>2026-03-05T16:33:20Z</published>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Xin Teng</name>\n    </author>\n    <author>\n      <name>Canyu Zhang</name>\n    </author>\n    <author>\n      <name>Shaoyi Zheng</name>\n    </author>\n    <author>\n      <name>Danyang Zhuo</name>\n    </author>\n    <author>\n      <name>Tianyi Zhou</name>\n    </author>\n    <author>\n      <name>Shengjie Wang</name>\n    </author>\n  </entry>"
}