Research

Paper

TESTING March 16, 2026

Investigating How Neighbourhood Scores Reflect Forecast Error

Authors

Bobby Antonio

Abstract

Meaningful scores for forecast verification are essential for developing reliable forecasts, and there has been much effort to develop scores that align well with human perceptions of forecast quality. Whilst many of these scores have intuitive interpretations, relatively little is known about how these scores rank different forecasts, and how scores reflect forecast error. We theoretically explore the behaviour of two scores that fall within the `neighbourhood' paradigm of spatial verification; the Fractions Skill Score (FSS) and Brier Divergence Skill Score (BDnSS). We investigate how each score ranks forecasts with two types of error; errors in the mean frequency (corresponding to intensity or shape errors) and errors in the standard deviation (corresponding to errors in spatial structure, such as blurring or excess noise). We find that under many situations the FSS assigns higher scores to forecasts that over-predict mean frequency, thus theoretically confirming the need to use the FSS with percentile thresholds. Both scores assign higher scores to smoother forecasts in many situations, a reflection of the `double penalty' problem; however, we observe that size of this effect is larger for the BDnSS than the FSS, showing that the FSS under some situations is less susceptible to the double penalty problem than the BDnSS.

Metadata

arXiv ID: 2603.15247
Provider: ARXIV
Primary Category: physics.ao-ph
Published: 2026-03-16
Fetched: 2026-03-17 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.15247v1</id>\n    <title>Investigating How Neighbourhood Scores Reflect Forecast Error</title>\n    <updated>2026-03-16T13:19:32Z</updated>\n    <link href='https://arxiv.org/abs/2603.15247v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.15247v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Meaningful scores for forecast verification are essential for developing reliable forecasts, and there has been much effort to develop scores that align well with human perceptions of forecast quality. Whilst many of these scores have intuitive interpretations, relatively little is known about how these scores rank different forecasts, and how scores reflect forecast error. We theoretically explore the behaviour of two scores that fall within the `neighbourhood' paradigm of spatial verification; the Fractions Skill Score (FSS) and Brier Divergence Skill Score (BDnSS). We investigate how each score ranks forecasts with two types of error; errors in the mean frequency (corresponding to intensity or shape errors) and errors in the standard deviation (corresponding to errors in spatial structure, such as blurring or excess noise). We find that under many situations the FSS assigns higher scores to forecasts that over-predict mean frequency, thus theoretically confirming the need to use the FSS with percentile thresholds. Both scores assign higher scores to smoother forecasts in many situations, a reflection of the `double penalty' problem; however, we observe that size of this effect is larger for the BDnSS than the FSS, showing that the FSS under some situations is less susceptible to the double penalty problem than the BDnSS.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='physics.ao-ph'/>\n    <published>2026-03-16T13:19:32Z</published>\n    <arxiv:primary_category term='physics.ao-ph'/>\n    <author>\n      <name>Bobby Antonio</name>\n    </author>\n  </entry>"
}