Paper
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Authors
Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue
Abstract
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.
Metadata
Related papers
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30
Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books
Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30
ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining
Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30
RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems
Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30
SAGAI-MID: A Generative AI-Driven Middleware for Dynamic Runtime Interoperability
Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.28767v1</id>\n <title>Gen-Searcher: Reinforcing Agentic Search for Image Generation</title>\n <updated>2026-03-30T17:59:56Z</updated>\n <link href='https://arxiv.org/abs/2603.28767v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.28767v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n <published>2026-03-30T17:59:56Z</published>\n <arxiv:comment>Project page: https://gen-searcher.vercel.app Code: https://github.com/tulerfeng/Gen-Searcher</arxiv:comment>\n <arxiv:primary_category term='cs.CV'/>\n <author>\n <name>Kaituo Feng</name>\n </author>\n <author>\n <name>Manyuan Zhang</name>\n </author>\n <author>\n <name>Shuang Chen</name>\n </author>\n <author>\n <name>Yunlong Lin</name>\n </author>\n <author>\n <name>Kaixuan Fan</name>\n </author>\n <author>\n <name>Yilei Jiang</name>\n </author>\n <author>\n <name>Hongyu Li</name>\n </author>\n <author>\n <name>Dian Zheng</name>\n </author>\n <author>\n <name>Chenyang Wang</name>\n </author>\n <author>\n <name>Xiangyu Yue</name>\n </author>\n </entry>"
}