Research

Paper

TESTING February 22, 2026

Limited Reasoning Space: The cage of long-horizon reasoning in LLMs

Authors

Zhenyu Li, Guanlin Wu, Cheems Wang, Yongqiang Zhao

Abstract

The test-time compute strategy, such as Chain-of-Thought (CoT), has significantly enhanced the ability of large language models to solve complex tasks like logical reasoning. However, empirical studies indicate that simply increasing the compute budget can sometimes lead to a collapse in test-time performance when employing typical task decomposition strategies such as CoT. This work hypothesizes that reasoning failures with larger compute budgets stem from static planning methods, which hardly perceive the intrinsic boundaries of LLM reasoning. We term it as the Limited Reasoning Space hypothesis and perform theoretical analysis through the lens of a non-autonomous stochastic dynamical system. This insight suggests that there is an optimal range for compute budgets; over-planning can lead to redundant feedback and may even impair reasoning capabilities. To exploit the compute-scaling benefits and suppress over-planning, this work proposes Halo, a model predictive control framework for LLM planning. Halo is designed for long-horizon tasks with reason-based planning and crafts an entropy-driven dual controller, which adopts a Measure-then-Plan strategy to achieve controllable reasoning. Experimental results demonstrate that Halo outperforms static baselines on complex long-horizon tasks by dynamically regulating planning at the reasoning boundary.

Metadata

arXiv ID: 2602.19281

Provider: ARXIV

Primary Category: cs.AI

Published: 2026-02-22

Fetched: 2026-02-24 04:38

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.19281v1</id>\n    <title>Limited Reasoning Space: The cage of long-horizon reasoning in LLMs</title>\n    <updated>2026-02-22T17:28:27Z</updated>\n    <link href='https://arxiv.org/abs/2602.19281v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.19281v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The test-time compute strategy, such as Chain-of-Thought (CoT), has significantly enhanced the ability of large language models to solve complex tasks like logical reasoning. However, empirical studies indicate that simply increasing the compute budget can sometimes lead to a collapse in test-time performance when employing typical task decomposition strategies such as CoT. This work hypothesizes that reasoning failures with larger compute budgets stem from static planning methods, which hardly perceive the intrinsic boundaries of LLM reasoning. We term it as the Limited Reasoning Space hypothesis and perform theoretical analysis through the lens of a non-autonomous stochastic dynamical system. This insight suggests that there is an optimal range for compute budgets; over-planning can lead to redundant feedback and may even impair reasoning capabilities. To exploit the compute-scaling benefits and suppress over-planning, this work proposes Halo, a model predictive control framework for LLM planning. Halo is designed for long-horizon tasks with reason-based planning and crafts an entropy-driven dual controller, which adopts a Measure-then-Plan strategy to achieve controllable reasoning. Experimental results demonstrate that Halo outperforms static baselines on complex long-horizon tasks by dynamically regulating planning at the reasoning boundary.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-22T17:28:27Z</published>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Zhenyu Li</name>\n    </author>\n    <author>\n      <name>Guanlin Wu</name>\n    </author>\n    <author>\n      <name>Cheems Wang</name>\n    </author>\n    <author>\n      <name>Yongqiang Zhao</name>\n    </author>\n  </entry>"
}