Research

Paper

TESTING March 13, 2026

When Your Model Stops Working: Anytime-Valid Calibration Monitoring

Authors

Tristan Farran

Abstract

Practitioners monitoring deployed probabilistic models face a fundamental trap: any fixed-sample test applied repeatedly over an unbounded stream will eventually raise a false alarm, even when the model remains perfectly stable. Existing methods typically lack formal error guarantees, conflate alarm time with changepoint location, and monitor indirect signals that do not fully characterize calibration. We present PITMonitor, an anytime-valid calibration-specific monitor that detects distributional shifts in probability integral transforms via a mixture e-process, providing Type I error control over an unbounded monitoring horizon as well as Bayesian changepoint estimation. On river's FriedmanDrift benchmark, PITMonitor achieves detection rates competitive with the strongest baselines across all three scenarios, although detection delay is substantially longer under local drift.

Metadata

arXiv ID: 2603.13156

Provider: ARXIV

Primary Category: stat.ME

Published: 2026-03-13

Fetched: 2026-03-16 06:01

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.13156v1</id>\n    <title>When Your Model Stops Working: Anytime-Valid Calibration Monitoring</title>\n    <updated>2026-03-13T16:50:14Z</updated>\n    <link href='https://arxiv.org/abs/2603.13156v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.13156v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Practitioners monitoring deployed probabilistic models face a fundamental trap: any fixed-sample test applied repeatedly over an unbounded stream will eventually raise a false alarm, even when the model remains perfectly stable. Existing methods typically lack formal error guarantees, conflate alarm time with changepoint location, and monitor indirect signals that do not fully characterize calibration. We present PITMonitor, an anytime-valid calibration-specific monitor that detects distributional shifts in probability integral transforms via a mixture e-process, providing Type I error control over an unbounded monitoring horizon as well as Bayesian changepoint estimation. On river's FriedmanDrift benchmark, PITMonitor achieves detection rates competitive with the strongest baselines across all three scenarios, although detection delay is substantially longer under local drift.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='stat.ME'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='stat.ML'/>\n    <published>2026-03-13T16:50:14Z</published>\n    <arxiv:primary_category term='stat.ME'/>\n    <author>\n      <name>Tristan Farran</name>\n    </author>\n  </entry>"
}