Research

Paper

TESTING March 13, 2026

Bounds on Agreement between Subjective and Objective Measurements

Authors

Jaden Pieper, Stephen D. Voran

Abstract

Objective estimators of multimedia quality are often judged by comparing estimates with subjective "truth data," most often via Pearson correlation coefficient (PCC) or mean-squared error (MSE). But subjective test results contain noise, so striving for a PCC of 1.0 or an MSE of 0.0 is neither realistic nor repeatable. Numerous efforts have been made to acknowledge and appropriately accommodate subjective test noise in objective-subjective comparisons, typically resulting in new analysis frameworks and figures-of-merit. We take a different approach. By making only basic assumptions, we derive bounds on PCC and MSE that can be expected for a subjective test. Consistent with intuition, these bounds are functions of subjective vote variance. When a subjective test includes vote variance information, the calculation of the bounds is easy, and in this case we say the resulting bounds are "fully data-driven." We provide two options for calculating bounds in cases where vote variance information is not available. One option is to use vote variance information from other subjective tests that do provide such information, and the second option is to use a model for subjective votes. Thus we introduce a binomial-based model for subjective votes (BinoVotes) that naturally leads to a mean opinion score (MOS) model, named BinoMOS, with multiple unique desirable properties. BinoMOS reproduces the discrete nature of MOS values and its dependence on the number of votes per file. This modeling provides vote variance information required by the PCC and MSE bounds and we compare this modeling with data from 18 subjective tests. The modeling yields PCC and MSE bounds that agree very well with those found from the data directly. These results allow one to set expectations for the PCC and MSE that might be achieved for any subjective test, even those where vote variance information is not available.

Metadata

arXiv ID: 2603.13204
Provider: ARXIV
Primary Category: eess.AS
Published: 2026-03-13
Fetched: 2026-03-16 06:01

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.13204v1</id>\n    <title>Bounds on Agreement between Subjective and Objective Measurements</title>\n    <updated>2026-03-13T17:41:09Z</updated>\n    <link href='https://arxiv.org/abs/2603.13204v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.13204v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Objective estimators of multimedia quality are often judged by comparing estimates with subjective \"truth data,\" most often via Pearson correlation coefficient (PCC) or mean-squared error (MSE). But subjective test results contain noise, so striving for a PCC of 1.0 or an MSE of 0.0 is neither realistic nor repeatable. Numerous efforts have been made to acknowledge and appropriately accommodate subjective test noise in objective-subjective comparisons, typically resulting in new analysis frameworks and figures-of-merit. We take a different approach. By making only basic assumptions, we derive bounds on PCC and MSE that can be expected for a subjective test.\n  Consistent with intuition, these bounds are functions of subjective vote variance. When a subjective test includes vote variance information, the calculation of the bounds is easy, and in this case we say the resulting bounds are \"fully data-driven.\" We provide two options for calculating bounds in cases where vote variance information is not available. One option is to use vote variance information from other subjective tests that do provide such information, and the second option is to use a model for subjective votes.\n  Thus we introduce a binomial-based model for subjective votes (BinoVotes) that naturally leads to a mean opinion score (MOS) model, named BinoMOS, with multiple unique desirable properties. BinoMOS reproduces the discrete nature of MOS values and its dependence on the number of votes per file. This modeling provides vote variance information required by the PCC and MSE bounds and we compare this modeling with data from 18 subjective tests. The modeling yields PCC and MSE bounds that agree very well with those found from the data directly. These results allow one to set expectations for the PCC and MSE that might be achieved for any subjective test, even those where vote variance information is not available.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='eess.AS'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='eess.IV'/>\n    <published>2026-03-13T17:41:09Z</published>\n    <arxiv:comment>Currently under review at IEEE Transactions on Multimedia. Submitted 5 November 2025, revised 3 March 2026</arxiv:comment>\n    <arxiv:primary_category term='eess.AS'/>\n    <author>\n      <name>Jaden Pieper</name>\n    </author>\n    <author>\n      <name>Stephen D. Voran</name>\n    </author>\n  </entry>"
}