Research

Paper

TESTING February 25, 2026

Calibrated Test-Time Guidance for Bayesian Inference

Authors

Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, Stephan Mandt

Abstract

Test-time guidance is a widely used mechanism for steering pretrained diffusion models toward outcomes specified by a reward function. Existing approaches, however, focus on maximizing reward rather than sampling from the true Bayesian posterior, leading to miscalibrated inference. In this work, we show that common test-time guidance methods do not recover the correct posterior distribution and identify the structural approximations responsible for this failure. We then propose consistent alternative estimators that enable calibrated sampling from the Bayesian posterior. We significantly outperform previous methods on a set of Bayesian inference tasks, and match state-of-the-art in black hole image reconstruction.

Metadata

arXiv ID: 2602.22428
Provider: ARXIV
Primary Category: cs.LG
Published: 2026-02-25
Fetched: 2026-02-27 04:35

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.22428v1</id>\n    <title>Calibrated Test-Time Guidance for Bayesian Inference</title>\n    <updated>2026-02-25T21:38:47Z</updated>\n    <link href='https://arxiv.org/abs/2602.22428v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.22428v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Test-time guidance is a widely used mechanism for steering pretrained diffusion models toward outcomes specified by a reward function. Existing approaches, however, focus on maximizing reward rather than sampling from the true Bayesian posterior, leading to miscalibrated inference. In this work, we show that common test-time guidance methods do not recover the correct posterior distribution and identify the structural approximations responsible for this failure. We then propose consistent alternative estimators that enable calibrated sampling from the Bayesian posterior. We significantly outperform previous methods on a set of Bayesian inference tasks, and match state-of-the-art in black hole image reconstruction.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-25T21:38:47Z</published>\n    <arxiv:comment>Preprint. Under review</arxiv:comment>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Daniel Geyfman</name>\n    </author>\n    <author>\n      <name>Felix Draxler</name>\n    </author>\n    <author>\n      <name>Jan Groeneveld</name>\n    </author>\n    <author>\n      <name>Hyunsoo Lee</name>\n    </author>\n    <author>\n      <name>Theofanis Karaletsos</name>\n    </author>\n    <author>\n      <name>Stephan Mandt</name>\n    </author>\n  </entry>"
}