Research

Paper

AI LLM February 19, 2026

AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue

Authors

Adib Sakhawat, Fardeen Sadab, Rakin Shahriar

Abstract

Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a game-theoretic framework that probes the asymmetry between information extraction (active deduction) and information containment (state maintenance) in dialogue. We propose two complementary tasks: AIDG-I, measuring pragmatic strategy in social deduction, and AIDG-II, measuring constraint satisfaction in a structured "20 Questions" setting. Across 439 games with six frontier LLMs, we observe a clear capability asymmetry: models perform substantially better at containment than deduction, with a 350 ELO advantage on defense;(Cohen's d = 5.47). We identify two bottlenecks driving this gap: (1) Information Dynamics, where confirmation strategies are 7.75x more effective than blind deduction (p < 0.00001), and (2) Constraint Adherence, where instruction-following degrades under conversational load, accounting for 41.3% of deductive failures. These findings suggest that while LLMs excel at local defensive coherence, they struggle with the global state tracking required for strategic inquiry.

Metadata

arXiv ID: 2602.17443
Provider: ARXIV
Primary Category: cs.CL
Published: 2026-02-19
Fetched: 2026-02-21 18:51

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.17443v1</id>\n    <title>AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue</title>\n    <updated>2026-02-19T15:09:12Z</updated>\n    <link href='https://arxiv.org/abs/2602.17443v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.17443v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a game-theoretic framework that probes the asymmetry between information extraction (active deduction) and information containment (state maintenance) in dialogue. We propose two complementary tasks: AIDG-I, measuring pragmatic strategy in social deduction, and AIDG-II, measuring constraint satisfaction in a structured \"20 Questions\" setting. Across 439 games with six frontier LLMs, we observe a clear capability asymmetry: models perform substantially better at containment than deduction, with a 350 ELO advantage on defense;(Cohen's d = 5.47). We identify two bottlenecks driving this gap: (1) Information Dynamics, where confirmation strategies are 7.75x more effective than blind deduction (p &lt; 0.00001), and (2) Constraint Adherence, where instruction-following degrades under conversational load, accounting for 41.3% of deductive failures. These findings suggest that while LLMs excel at local defensive coherence, they struggle with the global state tracking required for strategic inquiry.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-02-19T15:09:12Z</published>\n    <arxiv:comment>16 pages, 5 figures, 13 tables. Includes appendix and supplementary materials</arxiv:comment>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Adib Sakhawat</name>\n    </author>\n    <author>\n      <name>Fardeen Sadab</name>\n    </author>\n    <author>\n      <name>Rakin Shahriar</name>\n    </author>\n  </entry>"
}