Research

Paper

TESTING February 19, 2026

On the Evaluation Protocol of Gesture Recognition for UAV-based Rescue Operation based on Deep Learning: A Subject-Independence Perspective

Authors

Domonkos Varga

Abstract

This paper presents a methodological analysis of the gesture-recognition approach proposed by Liu and Szirányi, with a particular focus on the validity of their evaluation protocol. We show that the reported near-perfect accuracy metrics result from a frame-level random train-test split that inevitably mixes samples from the same subjects across both sets, causing severe data leakage. By examining the published confusion matrix, learning curves, and dataset construction, we demonstrate that the evaluation does not measure generalization to unseen individuals. Our findings underscore the importance of subject-independent data partitioning in vision-based gesture-recognition research, especially for applications - such as UAV-human interaction - that require reliable recognition of gestures performed by previously unseen people.

Metadata

arXiv ID: 2602.17854
Provider: ARXIV
Primary Category: cs.CV
Published: 2026-02-19
Fetched: 2026-02-23 05:33

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.17854v1</id>\n    <title>On the Evaluation Protocol of Gesture Recognition for UAV-based Rescue Operation based on Deep Learning: A Subject-Independence Perspective</title>\n    <updated>2026-02-19T21:37:42Z</updated>\n    <link href='https://arxiv.org/abs/2602.17854v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.17854v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>This paper presents a methodological analysis of the gesture-recognition approach proposed by Liu and Szirányi, with a particular focus on the validity of their evaluation protocol. We show that the reported near-perfect accuracy metrics result from a frame-level random train-test split that inevitably mixes samples from the same subjects across both sets, causing severe data leakage. By examining the published confusion matrix, learning curves, and dataset construction, we demonstrate that the evaluation does not measure generalization to unseen individuals. Our findings underscore the importance of subject-independent data partitioning in vision-based gesture-recognition research, especially for applications - such as UAV-human interaction - that require reliable recognition of gestures performed by previously unseen people.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-02-19T21:37:42Z</published>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Domonkos Varga</name>\n    </author>\n  </entry>"
}