Research

Paper

AI LLM February 27, 2026

Ask don't tell: Reducing sycophancy in large language models

Authors

Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau

Abstract

Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where we first isolate how input framing influences sycophancy, and second, leverage these findings to develop mitigation strategies. In a nested factorial design, we compare questions to various non-questions where we vary three orthogonal factors: epistemic certainty (statement, belief, conviction), perspective (I- vs user-perspective), and affirmation vs negation. We show that (1) sycophancy is substantially higher in response to non-questions compared to questions. Additionally, we find that (2) sycophancy increases monotonically with epistemic certainty conveyed by the user, and (3) is amplified by I-perspective framing. Building on this, we show that asking a model to convert non-questions into questions before answering significantly reduces sycophancy. Importantly, this effect is stronger than a simple baseline prompt asking models "not to be sycophantic". Our work offers a practical and effective input-level mitigation that both developers and users can easily adopt.

Metadata

arXiv ID: 2602.23971
Provider: ARXIV
Primary Category: cs.HC
Published: 2026-02-27
Fetched: 2026-03-02 06:04

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.23971v1</id>\n    <title>Ask don't tell: Reducing sycophancy in large language models</title>\n    <updated>2026-02-27T12:27:04Z</updated>\n    <link href='https://arxiv.org/abs/2602.23971v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.23971v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where we first isolate how input framing influences sycophancy, and second, leverage these findings to develop mitigation strategies. In a nested factorial design, we compare questions to various non-questions where we vary three orthogonal factors: epistemic certainty (statement, belief, conviction), perspective (I- vs user-perspective), and affirmation vs negation. We show that (1) sycophancy is substantially higher in response to non-questions compared to questions. Additionally, we find that (2) sycophancy increases monotonically with epistemic certainty conveyed by the user, and (3) is amplified by I-perspective framing. Building on this, we show that asking a model to convert non-questions into questions before answering significantly reduces sycophancy. Importantly, this effect is stronger than a simple baseline prompt asking models \"not to be sycophantic\". Our work offers a practical and effective input-level mitigation that both developers and users can easily adopt.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.HC'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-27T12:27:04Z</published>\n    <arxiv:primary_category term='cs.HC'/>\n    <author>\n      <name>Magda Dubois</name>\n    </author>\n    <author>\n      <name>Cozmin Ududec</name>\n    </author>\n    <author>\n      <name>Christopher Summerfield</name>\n    </author>\n    <author>\n      <name>Lennart Luettgau</name>\n    </author>\n  </entry>"
}