Research

Paper

TESTING March 22, 2026

COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

Authors

Xiaozhe Li, Tianyi Lyu, Siyi Yang, Yizhao Yang, Yuxi Gong, Jinxuan Huang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu

Abstract

Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs excel at following individual instructions, their ability to distill Collective Intent - the process of extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discussions - remains largely unexplored. To bridge this gap, we introduce COIN-BENCH, a dynamic, real-world, live-updating benchmark specifically designed to evaluate LLMs on collective intent understanding within the consumer domain. Unlike traditional benchmarks that focus on transactional outcomes, COIN-BENCH operationalizes intent as a hierarchical cognitive structure, ranging from explicit scenarios to deep causal reasoning. We implement a robust evaluation pipeline that combines a rule-based method with an LLM-as-the-Judge approach. This framework incorporates COIN-TREE for hierarchical cognitive structuring and retrieval-augmented verification (COIN-RAG) to ensure expert-level precision in analyzing raw, collective human discussions. An extensive evaluation of 20 state-of-the-art LLMs across four dimensions - depth, breadth, informativeness, and correctness - reveals that while current models can handle surface-level aggregation, they still struggle with the analytical depth required for complex intent synthesis. COIN-BENCH establishes a new standard for advancing LLMs from passive instruction followers to expert-level analytical agents capable of deciphering the collective voice of the real world. See our project page on COIN-BENCH.

Metadata

arXiv ID: 2603.21329
Provider: ARXIV
Primary Category: cs.IR
Published: 2026-03-22
Fetched: 2026-03-24 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.21329v1</id>\n    <title>COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding</title>\n    <updated>2026-03-22T17:12:14Z</updated>\n    <link href='https://arxiv.org/abs/2603.21329v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.21329v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs excel at following individual instructions, their ability to distill Collective Intent - the process of extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discussions - remains largely unexplored. To bridge this gap, we introduce COIN-BENCH, a dynamic, real-world, live-updating benchmark specifically designed to evaluate LLMs on collective intent understanding within the consumer domain. Unlike traditional benchmarks that focus on transactional outcomes, COIN-BENCH operationalizes intent as a hierarchical cognitive structure, ranging from explicit scenarios to deep causal reasoning. We implement a robust evaluation pipeline that combines a rule-based method with an LLM-as-the-Judge approach. This framework incorporates COIN-TREE for hierarchical cognitive structuring and retrieval-augmented verification (COIN-RAG) to ensure expert-level precision in analyzing raw, collective human discussions. An extensive evaluation of 20 state-of-the-art LLMs across four dimensions - depth, breadth, informativeness, and correctness - reveals that while current models can handle surface-level aggregation, they still struggle with the analytical depth required for complex intent synthesis. COIN-BENCH establishes a new standard for advancing LLMs from passive instruction followers to expert-level analytical agents capable of deciphering the collective voice of the real world. See our project page on COIN-BENCH.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-03-22T17:12:14Z</published>\n    <arxiv:primary_category term='cs.IR'/>\n    <author>\n      <name>Xiaozhe Li</name>\n    </author>\n    <author>\n      <name>Tianyi Lyu</name>\n    </author>\n    <author>\n      <name>Siyi Yang</name>\n    </author>\n    <author>\n      <name>Yizhao Yang</name>\n    </author>\n    <author>\n      <name>Yuxi Gong</name>\n    </author>\n    <author>\n      <name>Jinxuan Huang</name>\n    </author>\n    <author>\n      <name>Ligao Zhang</name>\n    </author>\n    <author>\n      <name>Zhuoyi Huang</name>\n    </author>\n    <author>\n      <name>Qingwen Liu</name>\n    </author>\n  </entry>"
}