Research

Paper

TESTING March 16, 2026

Investigating How Neighbourhood Scores Reflect Forecast Error

Authors

Bobby Antonio

Abstract

Meaningful scores for forecast verification are essential for developing reliable forecasts, and there has been much effort to develop scores that align well with human perceptions of forecast quality. Whilst many of these scores have intuitive interpretations, relatively little is known about how these scores rank different forecasts, and how scores reflect forecast error. We theoretically explore the behaviour of two scores that fall within the `neighbourhood' paradigm of spatial verification; the Fractions Skill Score (FSS) and Brier Divergence Skill Score (BDnSS). We investigate how each score ranks forecasts with two types of error; errors in the mean frequency (corresponding to intensity or shape errors) and errors in the standard deviation (corresponding to errors in spatial structure, such as blurring or excess noise). We find that under many situations the FSS assigns higher scores to forecasts that over-predict mean frequency, thus theoretically confirming the need to use the FSS with percentile thresholds. Both scores assign higher scores to smoother forecasts in many situations, a reflection of the `double penalty' problem; however, we observe that size of this effect is larger for the BDnSS than the FSS, showing that the FSS under some situations is less susceptible to the double penalty problem than the BDnSS.

Metadata

arXiv ID: 2603.15247

Provider: ARXIV

Primary Category: physics.ao-ph

Published: 2026-03-16

Fetched: 2026-03-17 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.15247v1</id>\n    <title>Investigating How Neighbourhood Scores Reflect Forecast Error</title>\n    <updated>2026-03-16T13:19:32Z</updated>\n    <link href='https://arxiv.org/abs/2603.15247v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.15247v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Meaningful scores for forecast verification are essential for developing reliable forecasts, and there has been much effort to develop scores that align well with human perceptions of forecast quality. Whilst many of these scores have intuitive interpretations, relatively little is known about how these scores rank different forecasts, and how scores reflect forecast error. We theoretically explore the behaviour of two scores that fall within the `neighbourhood' paradigm of spatial verification; the Fractions Skill Score (FSS) and Brier Divergence Skill Score (BDnSS). We investigate how each score ranks forecasts with two types of error; errors in the mean frequency (corresponding to intensity or shape errors) and errors in the standard deviation (corresponding to errors in spatial structure, such as blurring or excess noise). We find that under many situations the FSS assigns higher scores to forecasts that over-predict mean frequency, thus theoretically confirming the need to use the FSS with percentile thresholds. Both scores assign higher scores to smoother forecasts in many situations, a reflection of the `double penalty' problem; however, we observe that size of this effect is larger for the BDnSS than the FSS, showing that the FSS under some situations is less susceptible to the double penalty problem than the BDnSS.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='physics.ao-ph'/>\n    <published>2026-03-16T13:19:32Z</published>\n    <arxiv:primary_category term='physics.ao-ph'/>\n    <author>\n      <name>Bobby Antonio</name>\n    </author>\n  </entry>"
}