Research

Paper

AI LLM February 24, 2026

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

Authors

Yucheng Shi, Ying Li, Yu Wang, Yesu Feng, Arjun Rao, Rein Houthooft, Shradha Sehgal, Jin Wang, Hao Zhen, Ninghao Liu, Linas Baltrunas

Abstract

Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs into effective natural language inputs. Existing methods rely on rigid templates that simply concatenate fields, yielding suboptimal representations for recommendation. We propose a data-centric framework that learns verbalization for LLM-based recommendation. Using reinforcement learning, a verbalization agent transforms raw interaction histories into optimized textual contexts, with recommendation accuracy as the training signal. This agent learns to filter noise, incorporate relevant metadata, and reorganize information to improve downstream predictions. Experiments on a large-scale industrial streaming dataset show that learned verbalization delivers up to 93% relative improvement in discovery item recommendation accuracy over template-based baselines. Further analysis reveals emergent strategies such as user interest summarization, noise removal, and syntax normalization, offering insights into effective context construction for LLM-based recommender systems.

Metadata

arXiv ID: 2602.20558

Provider: ARXIV

Primary Category: cs.AI

Published: 2026-02-24

Fetched: 2026-02-25 06:05

Related papers

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya • 2026-03-25

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.20558v1</id>\n    <title>From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production</title>\n    <updated>2026-02-24T05:15:24Z</updated>\n    <link href='https://arxiv.org/abs/2602.20558v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.20558v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs into effective natural language inputs. Existing methods rely on rigid templates that simply concatenate fields, yielding suboptimal representations for recommendation. We propose a data-centric framework that learns verbalization for LLM-based recommendation. Using reinforcement learning, a verbalization agent transforms raw interaction histories into optimized textual contexts, with recommendation accuracy as the training signal. This agent learns to filter noise, incorporate relevant metadata, and reorganize information to improve downstream predictions. Experiments on a large-scale industrial streaming dataset show that learned verbalization delivers up to 93% relative improvement in discovery item recommendation accuracy over template-based baselines. Further analysis reveals emergent strategies such as user interest summarization, noise removal, and syntax normalization, offering insights into effective context construction for LLM-based recommender systems.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <published>2026-02-24T05:15:24Z</published>\n    <arxiv:comment>Work in progress</arxiv:comment>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Yucheng Shi</name>\n    </author>\n    <author>\n      <name>Ying Li</name>\n    </author>\n    <author>\n      <name>Yu Wang</name>\n    </author>\n    <author>\n      <name>Yesu Feng</name>\n    </author>\n    <author>\n      <name>Arjun Rao</name>\n    </author>\n    <author>\n      <name>Rein Houthooft</name>\n    </author>\n    <author>\n      <name>Shradha Sehgal</name>\n    </author>\n    <author>\n      <name>Jin Wang</name>\n    </author>\n    <author>\n      <name>Hao Zhen</name>\n    </author>\n    <author>\n      <name>Ninghao Liu</name>\n    </author>\n    <author>\n      <name>Linas Baltrunas</name>\n    </author>\n  </entry>"
}