Research

Paper

AI LLM March 06, 2026

Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion

Authors

Pengcheng Jiang, Judith Yue Li, Moonkyung Ryu, R. Lily Hu, Kun Su, Zhong Yi Wan, Liam Hebert, Hao Peng, Jiawei Han, Dima Kuzmin, Craig Boutilier

Abstract

Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties (e.g., diversity, coverage, complementarity, coherence) while remaining grounded with respect to a fixed database. Set-valued objectives are typically non-decomposable and are not captured by existing supervised (query, content) datasets which only prioritize top-1 retrieval. Consequently, fan-out retrieval is often employed to generate diverse subqueries to retrieve item sets. While reinforcement learning (RL) can optimize set-level objectives via interaction, deploying an RL-tuned LLM for fan-out retrieval is prohibitively expensive at inference time. Conversely, diffusion-based generative retrieval enables efficient single-pass fan-out in embedding space, but requires objective-aligned training targets. To address these issues, we propose R4T (Retrieve-for-Train), which uses RL once as an objective transducer in a three-step process: (i) train a fan-out LLM with composite set-level rewards, (ii) synthesize objective-consistent training pairs, and (iii) train a lightweight diffusion retriever to model the conditional distribution of set-valued outputs. Across large-scale fashion and music benchmarks consisting of curated item sets, we show that R4T improves retrieval quality relative to strong baselines while reducing query-time fan-out latency by an order of magnitude.

Metadata

arXiv ID: 2603.06397

Provider: ARXIV

Primary Category: cs.IR

Published: 2026-03-06

Fetched: 2026-03-09 06:05

Related papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.06397v1</id>\n    <title>Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion</title>\n    <updated>2026-03-06T15:42:33Z</updated>\n    <link href='https://arxiv.org/abs/2603.06397v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.06397v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties\n  (e.g., diversity, coverage, complementarity, coherence) while remaining grounded with respect to a fixed database. Set-valued objectives are typically\n  non-decomposable and are not captured by existing supervised (query, content) datasets which only prioritize top-1 retrieval. Consequently, fan-out\n  retrieval is often employed to generate diverse subqueries to retrieve item sets. While reinforcement learning (RL) can optimize set-level objectives via\n  interaction, deploying an RL-tuned LLM for fan-out retrieval is prohibitively expensive at inference time. Conversely, diffusion-based generative\n  retrieval enables efficient single-pass fan-out in embedding space, but requires objective-aligned training targets. To address these issues, we propose\n  R4T (Retrieve-for-Train), which uses RL once as an objective transducer in a three-step process: (i) train a fan-out LLM with composite set-level rewards,\n  (ii) synthesize objective-consistent training pairs, and (iii) train a lightweight diffusion retriever to model the conditional distribution of set-valued\n  outputs. Across large-scale fashion and music benchmarks consisting of curated item sets, we show that R4T improves retrieval quality relative to strong\n  baselines while reducing query-time fan-out latency by an order of magnitude.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <published>2026-03-06T15:42:33Z</published>\n    <arxiv:primary_category term='cs.IR'/>\n    <author>\n      <name>Pengcheng Jiang</name>\n    </author>\n    <author>\n      <name>Judith Yue Li</name>\n    </author>\n    <author>\n      <name>Moonkyung Ryu</name>\n    </author>\n    <author>\n      <name>R. Lily Hu</name>\n    </author>\n    <author>\n      <name>Kun Su</name>\n    </author>\n    <author>\n      <name>Zhong Yi Wan</name>\n    </author>\n    <author>\n      <name>Liam Hebert</name>\n    </author>\n    <author>\n      <name>Hao Peng</name>\n    </author>\n    <author>\n      <name>Jiawei Han</name>\n    </author>\n    <author>\n      <name>Dima Kuzmin</name>\n    </author>\n    <author>\n      <name>Craig Boutilier</name>\n    </author>\n  </entry>"
}