Research

Paper

AI LLM March 25, 2026

Human Factors in Detecting AI-Generated Portraits: Age, Sex, Device, and Confidence

Authors

Sunwhi Kim, Sunyul Kim

Abstract

Generative AI now produces photorealistic portraits that circulate widely in social and newslike contexts. Human ability to distinguish real from synthetic faces is time-sensitive because image generators continue to improve while public familiarity with synthetic media also changes. Here, we provide a time-stamped snapshot of human ability to distinguish real from AI-generated portraits produced by models available in July 2025. In a large-scale web experiment conducted from August 2025 to January 2026, 1,664 participants aged 20-69 years (mobile n = 1,330; PC n = 334) completed a two-alternative forced-choice task (REAL vs AI). Each participant judged 20 trials sampled from a 210-image pool comprising real FFHQ photographs and AI-generated portraits from ChatGPT-4o and Imagen 3. Overall accuracy was high (mean 85.2%, median 90%) but varied across groups. PC participants outperformed mobile participants by 3.65 percentage points. Accuracy declined with age in both device cohorts and more steeply on mobile than on PC (-0.607 vs -0.230 percentage points per year). Self-rated AI-detection confidence and AI exposure were positively associated with accuracy and statistically accounted for part of the age-related decline, with confidence accounting for the larger share. In the mobile cohort, an age-related sex divergence emerged among participants in their 50s and 60s, with female participants performing worse. Trial-level reaction-time models showed that correct AI judgments were faster than correct real judgments, whereas incorrect AI judgments were slower than incorrect real judgments. ChatGPT-4o portraits were harder and slower to classify than Imagen 3 portraits and were associated with a steeper age-related decline in performance. These findings frame AI portrait detection as a human-factors problem shaped by age, sex, device context, and confidence, not image realism alone.

Metadata

arXiv ID: 2603.24048
Provider: ARXIV
Primary Category: cs.HC
Published: 2026-03-25
Fetched: 2026-03-26 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.24048v1</id>\n    <title>Human Factors in Detecting AI-Generated Portraits: Age, Sex, Device, and Confidence</title>\n    <updated>2026-03-25T07:56:26Z</updated>\n    <link href='https://arxiv.org/abs/2603.24048v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.24048v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Generative AI now produces photorealistic portraits that circulate widely in social and newslike contexts. Human ability to distinguish real from synthetic faces is time-sensitive because image generators continue to improve while public familiarity with synthetic media also changes. Here, we provide a time-stamped snapshot of human ability to distinguish real from AI-generated portraits produced by models available in July 2025. In a large-scale web experiment conducted from August 2025 to January 2026, 1,664 participants aged 20-69 years (mobile n = 1,330; PC n = 334) completed a two-alternative forced-choice task (REAL vs AI). Each participant judged 20 trials sampled from a 210-image pool comprising real FFHQ photographs and AI-generated portraits from ChatGPT-4o and Imagen 3. Overall accuracy was high (mean 85.2%, median 90%) but varied across groups. PC participants outperformed mobile participants by 3.65 percentage points. Accuracy declined with age in both device cohorts and more steeply on mobile than on PC (-0.607 vs -0.230 percentage points per year). Self-rated AI-detection confidence and AI exposure were positively associated with accuracy and statistically accounted for part of the age-related decline, with confidence accounting for the larger share. In the mobile cohort, an age-related sex divergence emerged among participants in their 50s and 60s, with female participants performing worse. Trial-level reaction-time models showed that correct AI judgments were faster than correct real judgments, whereas incorrect AI judgments were slower than incorrect real judgments. ChatGPT-4o portraits were harder and slower to classify than Imagen 3 portraits and were associated with a steeper age-related decline in performance. These findings frame AI portrait detection as a human-factors problem shaped by age, sex, device context, and confidence, not image realism alone.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.HC'/>\n    <published>2026-03-25T07:56:26Z</published>\n    <arxiv:comment>36 pages, 15 figures, 1 supplementary table. Project page: https://github.com/gdrpaul3-byte/hsmu_ai_detection_public</arxiv:comment>\n    <arxiv:primary_category term='cs.HC'/>\n    <author>\n      <name>Sunwhi Kim</name>\n      <arxiv:affiliation>Hwasung Medi-Science University, Dept. of Bio-Healthcare, South Korea</arxiv:affiliation>\n    </author>\n    <author>\n      <name>Sunyul Kim</name>\n      <arxiv:affiliation>Yonsei University, Graduate School of Engineering, Dept. of Artificial Intelligence, South Korea</arxiv:affiliation>\n    </author>\n  </entry>"
}