Paper
On the Concept of Violence: A Comparative Study of Human and AI Judgments
Authors
Mariachiara Stellato, Francesco Lancia, Chiara Galeazzi, Nico Curti
Abstract
Background: What counts as violence is neither self-evident nor universally agreed upon. While physical aggression is prototypical, contemporary societies increasingly debate whether exclusion, humiliation, online harassment or symbolic acts should be classified within the same moral category. At the same time, Large Language Models (LLMs) are being consulted in everyday contexts to interpret and label complex social behaviors. Whether these systems reproduce, reshape or simplify human conceptions of violence remains an open question. Methods: Here we present a systematic comparison between human judgements and LLM classifications across 22 scenarios carefully designed to be morally dividing, spanning from physical and verbally aggressive behavior, relational dynamics, marginalization, symbolic actions and verbal expressions. Human responses were compared with outputs from multiple instruction-tuned models of varying sizes and architectures. We conducted global, sentence-level and thematic-domain analyses, and examined variability across models to assess patterns of convergence and divergence. Findings: This study treats violence as a strategically chosen proxy through which broader belief formation dynamics can be observed. Violence is not the focus of the study, but it serves as a tool to investigate broader analysis. It enables a structured investigation of how LLMs operationalize ambiguous moral constructs, negotiate conceptual boundaries, and transform plural human interpretations into singular outputs. More broadly, the findings contribute to ongoing debates about the epistemic role of conversational AI in shaping everyday interpretations of harm, responsibility and social norms, highlighting the importance of transparency and critical engagement as these systems increasingly mediate public reasoning.
Metadata
Related papers
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Biplab Pal, Santanu Bhattacharya • 2026-03-25
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2602.17256v1</id>\n <title>On the Concept of Violence: A Comparative Study of Human and AI Judgments</title>\n <updated>2026-02-19T10:58:30Z</updated>\n <link href='https://arxiv.org/abs/2602.17256v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2602.17256v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Background: What counts as violence is neither self-evident nor universally agreed upon. While physical aggression is prototypical, contemporary societies increasingly debate whether exclusion, humiliation, online harassment or symbolic acts should be classified within the same moral category. At the same time, Large Language Models (LLMs) are being consulted in everyday contexts to interpret and label complex social behaviors. Whether these systems reproduce, reshape or simplify human conceptions of violence remains an open question. Methods: Here we present a systematic comparison between human judgements and LLM classifications across 22 scenarios carefully designed to be morally dividing, spanning from physical and verbally aggressive behavior, relational dynamics, marginalization, symbolic actions and verbal expressions. Human responses were compared with outputs from multiple instruction-tuned models of varying sizes and architectures. We conducted global, sentence-level and thematic-domain analyses, and examined variability across models to assess patterns of convergence and divergence. Findings: This study treats violence as a strategically chosen proxy through which broader belief formation dynamics can be observed. Violence is not the focus of the study, but it serves as a tool to investigate broader analysis. It enables a structured investigation of how LLMs operationalize ambiguous moral constructs, negotiate conceptual boundaries, and transform plural human interpretations into singular outputs. More broadly, the findings contribute to ongoing debates about the epistemic role of conversational AI in shaping everyday interpretations of harm, responsibility and social norms, highlighting the importance of transparency and critical engagement as these systems increasingly mediate public reasoning.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='physics.soc-ph'/>\n <category scheme='http://arxiv.org/schemas/atom' term='physics.app-ph'/>\n <category scheme='http://arxiv.org/schemas/atom' term='physics.comp-ph'/>\n <published>2026-02-19T10:58:30Z</published>\n <arxiv:primary_category term='physics.soc-ph'/>\n <author>\n <name>Mariachiara Stellato</name>\n </author>\n <author>\n <name>Francesco Lancia</name>\n </author>\n <author>\n <name>Chiara Galeazzi</name>\n </author>\n <author>\n <name>Nico Curti</name>\n </author>\n </entry>"
}