Research

Paper

AI LLM March 04, 2026

The Company You Keep: How LLMs Respond to Dark Triad Traits

Authors

Zeyi Lu, Angelica Henestrosa, Pavel Chizhov, Ivan P. Yamshchikov

Abstract

Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset. Our analysis reveals differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases. Model behavior also depends on the severity level and differs in the sentiment of the response. Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.

Metadata

arXiv ID: 2603.04299
Provider: ARXIV
Primary Category: cs.CL
Published: 2026-03-04
Fetched: 2026-03-05 06:06

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.04299v1</id>\n    <title>The Company You Keep: How LLMs Respond to Dark Triad Traits</title>\n    <updated>2026-03-04T17:19:22Z</updated>\n    <link href='https://arxiv.org/abs/2603.04299v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.04299v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset. Our analysis reveals differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases. Model behavior also depends on the severity level and differs in the sentiment of the response. Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-03-04T17:19:22Z</published>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Zeyi Lu</name>\n    </author>\n    <author>\n      <name>Angelica Henestrosa</name>\n    </author>\n    <author>\n      <name>Pavel Chizhov</name>\n    </author>\n    <author>\n      <name>Ivan P. Yamshchikov</name>\n    </author>\n  </entry>"
}