Research

Paper

TESTING March 10, 2026

Anomaly detection using surprisals

Authors

Rob J Hyndman, David T. Frazier

Abstract

Anomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.

Metadata

arXiv ID: 2603.09318
Provider: ARXIV
Primary Category: stat.ME
Published: 2026-03-10
Fetched: 2026-03-11 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.09318v1</id>\n    <title>Anomaly detection using surprisals</title>\n    <updated>2026-03-10T07:50:22Z</updated>\n    <link href='https://arxiv.org/abs/2603.09318v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.09318v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Anomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='stat.ME'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='stat.AP'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='stat.OT'/>\n    <published>2026-03-10T07:50:22Z</published>\n    <arxiv:primary_category term='stat.ME'/>\n    <author>\n      <name>Rob J Hyndman</name>\n    </author>\n    <author>\n      <name>David T. Frazier</name>\n    </author>\n  </entry>"
}