Research

Paper

AI LLM March 06, 2026

Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation

Authors

Changcheng Li, Jiancan Wu, Hengheng Zhang, Zhengsu Chen, Guo An, Junxiang Qiu, Xiang Wang, Qi Tian

Abstract

Reliable deployment of large language models (LLMs) requires accurate uncertainty estimation. Existing methods are predominantly answer-first, producing confidence only after generating an answer, which measure the correctness of a specific response and limits practical usability. We study a confidence-first paradigm, where the model outputs its confidence before answering, interpreting this score as the model's probability of answering the question correctly under its current policy. We propose CoCA(Co-optimized Confidence and Answers), a GRPO reinforcement learning framework that jointly optimizes confidence calibration and answer accuracy via segmented credit assignment. By assigning separate rewards and group-relative advantages to confidence and answer segments, CoCA enables stable joint optimization and avoids reward hacking. Experiments across math, code, and factual QA benchmarks show improved calibration and uncertainty discrimination while preserving answer quality, thereby enabling a broader range of downstream applications.

Metadata

arXiv ID: 2603.05881

Provider: ARXIV

Primary Category: cs.CL

Published: 2026-03-06

Fetched: 2026-03-09 06:05

Related papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.05881v1</id>\n    <title>Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation</title>\n    <updated>2026-03-06T04:03:13Z</updated>\n    <link href='https://arxiv.org/abs/2603.05881v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.05881v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Reliable deployment of large language models (LLMs) requires accurate uncertainty estimation. Existing methods are predominantly answer-first, producing confidence only after generating an answer, which measure the correctness of a specific response and limits practical usability. We study a confidence-first paradigm, where the model outputs its confidence before answering, interpreting this score as the model's probability of answering the question correctly under its current policy.\n  We propose CoCA(Co-optimized Confidence and Answers), a GRPO reinforcement learning framework that jointly optimizes confidence calibration and answer accuracy via segmented credit assignment. By assigning separate rewards and group-relative advantages to confidence and answer segments, CoCA enables stable joint optimization and avoids reward hacking. Experiments across math, code, and factual QA benchmarks show improved calibration and uncertainty discrimination while preserving answer quality, thereby enabling a broader range of downstream applications.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-03-06T04:03:13Z</published>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Changcheng Li</name>\n    </author>\n    <author>\n      <name>Jiancan Wu</name>\n    </author>\n    <author>\n      <name>Hengheng Zhang</name>\n    </author>\n    <author>\n      <name>Zhengsu Chen</name>\n    </author>\n    <author>\n      <name>Guo An</name>\n    </author>\n    <author>\n      <name>Junxiang Qiu</name>\n    </author>\n    <author>\n      <name>Xiang Wang</name>\n    </author>\n    <author>\n      <name>Qi Tian</name>\n    </author>\n  </entry>"
}