Research

Paper

TESTING March 20, 2026

Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts

Authors

John Turnbull, Shivam Grover, Amin Jalali, Ali Etemad

Abstract

Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference without labeled source data. We present the first evaluation of TTA methods for FER under natural domain shifts, performing cross-dataset experiments with widely used FER datasets. This moves beyond synthetic corruptions to examine real-world shifts caused by differing collection protocols, annotation standards, and demographics. Results show TTA can boost FER performance under natural shifts by up to 11.34\%. Entropy minimization methods such as TENT and SAR perform best when the target distribution is clean. In contrast, prototype adjustment methods like T3A excel under larger distributional distance scenarios. Finally, feature alignment methods such as SHOT deliver the largest gains when the target distribution is noisier than our source. Our cross-dataset analysis shows that TTA effectiveness is governed by the distributional distance and the severity of the natural shift across domains.

Metadata

arXiv ID: 2603.19994
Provider: ARXIV
Primary Category: cs.CV
Published: 2026-03-20
Fetched: 2026-03-23 16:54

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.19994v1</id>\n    <title>Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts</title>\n    <updated>2026-03-20T14:44:25Z</updated>\n    <link href='https://arxiv.org/abs/2603.19994v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.19994v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference without labeled source data. We present the first evaluation of TTA methods for FER under natural domain shifts, performing cross-dataset experiments with widely used FER datasets. This moves beyond synthetic corruptions to examine real-world shifts caused by differing collection protocols, annotation standards, and demographics. Results show TTA can boost FER performance under natural shifts by up to 11.34\\%. Entropy minimization methods such as TENT and SAR perform best when the target distribution is clean. In contrast, prototype adjustment methods like T3A excel under larger distributional distance scenarios. Finally, feature alignment methods such as SHOT deliver the largest gains when the target distribution is noisier than our source. Our cross-dataset analysis shows that TTA effectiveness is governed by the distributional distance and the severity of the natural shift across domains.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='eess.IV'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='eess.SP'/>\n    <published>2026-03-20T14:44:25Z</published>\n    <arxiv:comment>Accepted at ICASSP 2026</arxiv:comment>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>John Turnbull</name>\n    </author>\n    <author>\n      <name>Shivam Grover</name>\n    </author>\n    <author>\n      <name>Amin Jalali</name>\n    </author>\n    <author>\n      <name>Ali Etemad</name>\n    </author>\n  </entry>"
}