Research

Paper

AI LLM February 23, 2026

Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios

Authors

Zoha Hayat Bhatti, Bakhtawar Ahtisham, Seemal Tausif, Niklas George, Nida ul Habib Bajwa, Mobin Javed

Abstract

Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing), raising urgent concerns about deception at scale. Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts and what perceptual strategies underlie their judgments. We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips (8 AI-generated, 8 human-recorded) and classified each as human or AI while reporting confidence. Participants performed poorly: mean accuracy was 37.5%, below chance in a binary classification task. At the stimulus level, misclassification was bidirectional: 75% of AI-generated clips were majority-labeled as human, while 62.5% of human-recorded clips were majority-labeled as AI. Signal Detection Theory analysis revealed near-zero discriminability (d' approx 0), indicating inability to reliably distinguish synthetic from human voices rather than simple response bias. Qualitative analysis of 315 coded excerpts revealed reliance on paralinguistic and emotional heuristics, including pauses, filler words, vocal variability, cadence, and emotional expressiveness. However, these surface-level cues traditionally associated with human authenticity were frequently replicated by AI-generated samples. Misclassifications were often accompanied by moderate to high confidence, suggesting perceptual miscalibration rather than uncertainty. Together, our findings demonstrate that authenticity judgments based on vocal heuristics are unreliable in contemporary vishing scenarios. We discuss implications for security interventions, user education, and AI-mediated deception mitigation.

Metadata

arXiv ID: 2602.20061
Provider: ARXIV
Primary Category: cs.CR
Published: 2026-02-23
Fetched: 2026-02-24 04:38

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.20061v1</id>\n    <title>Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios</title>\n    <updated>2026-02-23T17:17:53Z</updated>\n    <link href='https://arxiv.org/abs/2602.20061v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.20061v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing), raising urgent concerns about deception at scale. Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts and what perceptual strategies underlie their judgments. We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips (8 AI-generated, 8 human-recorded) and classified each as human or AI while reporting confidence. Participants performed poorly: mean accuracy was 37.5%, below chance in a binary classification task. At the stimulus level, misclassification was bidirectional: 75% of AI-generated clips were majority-labeled as human, while 62.5% of human-recorded clips were majority-labeled as AI. Signal Detection Theory analysis revealed near-zero discriminability (d' approx 0), indicating inability to reliably distinguish synthetic from human voices rather than simple response bias. Qualitative analysis of 315 coded excerpts revealed reliance on paralinguistic and emotional heuristics, including pauses, filler words, vocal variability, cadence, and emotional expressiveness. However, these surface-level cues traditionally associated with human authenticity were frequently replicated by AI-generated samples. Misclassifications were often accompanied by moderate to high confidence, suggesting perceptual miscalibration rather than uncertainty. Together, our findings demonstrate that authenticity judgments based on vocal heuristics are unreliable in contemporary vishing scenarios. We discuss implications for security interventions, user education, and AI-mediated deception mitigation.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CR'/>\n    <published>2026-02-23T17:17:53Z</published>\n    <arxiv:primary_category term='cs.CR'/>\n    <author>\n      <name>Zoha Hayat Bhatti</name>\n    </author>\n    <author>\n      <name>Bakhtawar Ahtisham</name>\n    </author>\n    <author>\n      <name>Seemal Tausif</name>\n    </author>\n    <author>\n      <name>Niklas George</name>\n    </author>\n    <author>\n      <name>Nida ul Habib Bajwa</name>\n    </author>\n    <author>\n      <name>Mobin Javed</name>\n    </author>\n  </entry>"
}