Paper
The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments
Authors
Ralf Schmälzle, Yuetong Du, Sue Lim, Gary Bente
Abstract
Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured animations to present the pure nonverbal performance of public speakers to audiences - either in silence (nonverbal-only) or paired with the verbal content (nonverbal-plus-verbal). Using continuous response measurement (CRM), we find that audience judgments solidify with remarkable speed: Moment-to-moment engagement ratings become highly predictive of subsequent evaluations within the initial 10 seconds of the performance. Most notably, this predictive relationship emerged faster and slightly stronger in the nonverbal-only condition, with predictive information being present already after less than 5 seconds. These findings elucidate the social impact a speaker's nonverbal performance has on audience impressions, even when dissociated from the verbal content of the speech. Our approach provides a high-resolution temporal map of social impression formation, pointing to an early "moment of capture" that appears to set the stage for the reception of the following message. On a broader scale, this research validates a powerful new method to isolate different communicative channels, to scientifically deconstruct rhetorical skill, and to study the pervasive impact of nonverbal behavior more broadly. It also enables us to translate the ancient art of rhetoric into a modern science of social impression formation, yielding an empirical basis that can inform human-centered feedback, develop AI-based augmentation tools, and guide the design of engaging, socially present avatars in an increasingly AI-mediated and virtual world.
Metadata
Related papers
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Biplab Pal, Santanu Bhattacharya • 2026-03-25
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2602.23920v1</id>\n <title>The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments</title>\n <updated>2026-02-27T11:07:25Z</updated>\n <link href='https://arxiv.org/abs/2602.23920v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2602.23920v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured animations to present the pure nonverbal performance of public speakers to audiences - either in silence (nonverbal-only) or paired with the verbal content (nonverbal-plus-verbal). Using continuous response measurement (CRM), we find that audience judgments solidify with remarkable speed: Moment-to-moment engagement ratings become highly predictive of subsequent evaluations within the initial 10 seconds of the performance. Most notably, this predictive relationship emerged faster and slightly stronger in the nonverbal-only condition, with predictive information being present already after less than 5 seconds. These findings elucidate the social impact a speaker's nonverbal performance has on audience impressions, even when dissociated from the verbal content of the speech. Our approach provides a high-resolution temporal map of social impression formation, pointing to an early \"moment of capture\" that appears to set the stage for the reception of the following message. On a broader scale, this research validates a powerful new method to isolate different communicative channels, to scientifically deconstruct rhetorical skill, and to study the pervasive impact of nonverbal behavior more broadly. It also enables us to translate the ancient art of rhetoric into a modern science of social impression formation, yielding an empirical basis that can inform human-centered feedback, develop AI-based augmentation tools, and guide the design of engaging, socially present avatars in an increasingly AI-mediated and virtual world.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.HC'/>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.ET'/>\n <published>2026-02-27T11:07:25Z</published>\n <arxiv:primary_category term='cs.HC'/>\n <author>\n <name>Ralf Schmälzle</name>\n </author>\n <author>\n <name>Yuetong Du</name>\n </author>\n <author>\n <name>Sue Lim</name>\n </author>\n <author>\n <name>Gary Bente</name>\n </author>\n </entry>"
}