Research

Paper

AI LLM February 27, 2026

ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference

Authors

Siyuan Ma, Bo Gao, Xiaojun Jia, Simeng Qin, Tianlin Li, Ke Ma, Xiaoshuang Jia, Wenqi Ren, Yang Liu

Abstract

The paradigm of large language model (LLM) reasoning is shifting from parameter scaling to test-time compute scaling, yet many existing approaches still rely on uniform brute-force sampling (for example, fixed best-of-N or self-consistency) that is costly, hard to attribute, and can trigger overthinking with diminishing returns. We propose ODAR-Expert, an adaptive routing framework that optimizes the accuracy-efficiency trade-off via principled resource allocation. ODAR uses a difficulty estimator grounded in amortized active inference to dynamically route queries between a heuristic Fast Agent and a deliberative Slow Agent. We further introduce a free-energy-principled, risk-sensitive fusion mechanism that selects answers by minimizing a variational free energy objective, balancing log-likelihood with epistemic uncertainty (varentropy) as a principled alternative to ad hoc voting over heterogeneous candidates. Extensive evaluation across 23 benchmarks shows strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam (HLE), while improving the compute-accuracy frontier under compute-matched settings. We also validate reproducibility on a fully open-source stack (Llama 4 + DeepSeek), where ODAR surpasses homogeneous sampling strategies while reducing computational costs by 82%. Overall, our results suggest that thinking-optimal scaling requires adaptive resource allocation with free-energy-based decision-making rather than simply increasing test-time compute.

Metadata

arXiv ID: 2602.23681
Provider: ARXIV
Primary Category: cs.AI
Published: 2026-02-27
Fetched: 2026-03-02 06:04

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.23681v1</id>\n    <title>ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference</title>\n    <updated>2026-02-27T05:22:01Z</updated>\n    <link href='https://arxiv.org/abs/2602.23681v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.23681v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The paradigm of large language model (LLM) reasoning is shifting from parameter scaling to test-time compute scaling, yet many existing approaches still rely on uniform brute-force sampling (for example, fixed best-of-N or self-consistency) that is costly, hard to attribute, and can trigger overthinking with diminishing returns. We propose ODAR-Expert, an adaptive routing framework that optimizes the accuracy-efficiency trade-off via principled resource allocation. ODAR uses a difficulty estimator grounded in amortized active inference to dynamically route queries between a heuristic Fast Agent and a deliberative Slow Agent. We further introduce a free-energy-principled, risk-sensitive fusion mechanism that selects answers by minimizing a variational free energy objective, balancing log-likelihood with epistemic uncertainty (varentropy) as a principled alternative to ad hoc voting over heterogeneous candidates. Extensive evaluation across 23 benchmarks shows strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam (HLE), while improving the compute-accuracy frontier under compute-matched settings. We also validate reproducibility on a fully open-source stack (Llama 4 + DeepSeek), where ODAR surpasses homogeneous sampling strategies while reducing computational costs by 82%. Overall, our results suggest that thinking-optimal scaling requires adaptive resource allocation with free-energy-based decision-making rather than simply increasing test-time compute.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-27T05:22:01Z</published>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Siyuan Ma</name>\n    </author>\n    <author>\n      <name>Bo Gao</name>\n    </author>\n    <author>\n      <name>Xiaojun Jia</name>\n    </author>\n    <author>\n      <name>Simeng Qin</name>\n    </author>\n    <author>\n      <name>Tianlin Li</name>\n    </author>\n    <author>\n      <name>Ke Ma</name>\n    </author>\n    <author>\n      <name>Xiaoshuang Jia</name>\n    </author>\n    <author>\n      <name>Wenqi Ren</name>\n    </author>\n    <author>\n      <name>Yang Liu</name>\n    </author>\n  </entry>"
}