Research

Paper

TESTING March 19, 2026

Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization

Authors

Shiyan Liu, Qifeng Xia, Qiyun Xia, Yisheng Liu, Xinyu Yu, Rui Qu

Abstract

Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed, GEPA degrades accuracy from 23.81% to 13.50%. We propose VISTA, a multi-agent APO framework that decouples hypothesis generation from prompt rewriting, enabling semantically labeled hypotheses, parallel minibatch verification, and interpretable optimization trace. A two-layer explore-exploit mechanism combining random restart and epsilon-greedy sampling further escapes local optima. VISTA recovers accuracy to 87.57% on the same defective seed and consistently outperforms baselines across all conditions on GSM8K and AIME2025.

Metadata

arXiv ID: 2603.18388

Provider: ARXIV

Primary Category: cs.AI

Published: 2026-03-19

Fetched: 2026-03-20 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.18388v1</id>\n    <title>Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization</title>\n    <updated>2026-03-19T01:14:36Z</updated>\n    <link href='https://arxiv.org/abs/2603.18388v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.18388v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed, GEPA degrades accuracy from 23.81% to 13.50%. We propose VISTA, a multi-agent APO framework that decouples hypothesis generation from prompt rewriting, enabling semantically labeled hypotheses, parallel minibatch verification, and interpretable optimization trace. A two-layer explore-exploit mechanism combining random restart and epsilon-greedy sampling further escapes local optima. VISTA recovers accuracy to 87.57% on the same defective seed and consistently outperforms baselines across all conditions on GSM8K and AIME2025.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.MA'/>\n    <published>2026-03-19T01:14:36Z</published>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Shiyan Liu</name>\n    </author>\n    <author>\n      <name>Qifeng Xia</name>\n    </author>\n    <author>\n      <name>Qiyun Xia</name>\n    </author>\n    <author>\n      <name>Yisheng Liu</name>\n    </author>\n    <author>\n      <name>Xinyu Yu</name>\n    </author>\n    <author>\n      <name>Rui Qu</name>\n    </author>\n  </entry>"
}