Research

Paper

TESTING February 18, 2026

Learning under noisy supervision is governed by a feedback-truth gap

Authors

Elan Schonfeld, Elias Wisnia

Abstract

When feedback is absorbed faster than task structure can be evaluated, the learner will favor feedback over truth. A two-timescale model shows this feedback-truth gap is inevitable whenever the two rates differ and vanishes only when they match. We test this prediction across neural networks trained with noisy labels (30 datasets, 2,700 runs), human probabilistic reversal learning (N = 292), and human reward/punishment learning with concurrent EEG (N = 25). In each system, truth is defined operationally: held-out labels, the objectively correct option, or the participant's pre-feedback expectation - the only non-circular reference decodable from post-feedback EEG. The gap appeared universally but was regulated differently: dense networks accumulated it as memorization; sparse-residual scaffolding suppressed it; humans generated transient over-commitment that was actively recovered. Neural over-commitment (~0.04-0.10) was amplified tenfold into behavioral commitment (d = 3.3-3.9). The gap is a fundamental constraint on learning under noisy supervision; its consequences depend on the regulation each system employs.

Metadata

arXiv ID: 2602.16829
Provider: ARXIV
Primary Category: cs.LG
Published: 2026-02-18
Fetched: 2026-02-21 18:51

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.16829v1</id>\n    <title>Learning under noisy supervision is governed by a feedback-truth gap</title>\n    <updated>2026-02-18T19:50:56Z</updated>\n    <link href='https://arxiv.org/abs/2602.16829v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.16829v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>When feedback is absorbed faster than task structure can be evaluated, the learner will favor feedback over truth. A two-timescale model shows this feedback-truth gap is inevitable whenever the two rates differ and vanishes only when they match. We test this prediction across neural networks trained with noisy labels (30 datasets, 2,700 runs), human probabilistic reversal learning (N = 292), and human reward/punishment learning with concurrent EEG (N = 25). In each system, truth is defined operationally: held-out labels, the objectively correct option, or the participant's pre-feedback expectation - the only non-circular reference decodable from post-feedback EEG. The gap appeared universally but was regulated differently: dense networks accumulated it as memorization; sparse-residual scaffolding suppressed it; humans generated transient over-commitment that was actively recovered. Neural over-commitment (~0.04-0.10) was amplified tenfold into behavioral commitment (d = 3.3-3.9). The gap is a fundamental constraint on learning under noisy supervision; its consequences depend on the regulation each system employs.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.NE'/>\n    <published>2026-02-18T19:50:56Z</published>\n    <arxiv:comment>33 pages, 5 figures, 10 extended data figures, 4 extended data tables; 10-page supplementary information</arxiv:comment>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Elan Schonfeld</name>\n    </author>\n    <author>\n      <name>Elias Wisnia</name>\n    </author>\n  </entry>"
}