Research

Paper

TESTING February 19, 2026

Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding

Authors

Rahul Thomas, Teo Kitanovski, Micah Goldblum, Arka Pal

Abstract

Multi-path speculative decoding accelerates lossless sampling from a target model by using a cheaper draft model to generate a draft tree of tokens, and then applies a verification algorithm that accepts a subset of these. While prior work has proposed various verification algorithms for i.i.d rollouts, their relative performance under matched settings remains unclear. In this work, we firstly present a systematic evaluation of verification strategies across model families, tasks, and sampling regimes, and find that Traversal Verification dominates consistently, with OT-based methods lagging far behind. Our analysis uncovers that this occurs because OT-based methods achieve high multi-token acceptance near the root of the draft tree, while multi-token gains are most impactful deeper in the draft tree, where draft and target distributions diverge. Based on this insight, we propose delayed tree expansion, which drafts a partial single path, delaying the i.i.d. branching point. We show that delayed tree expansion preserves the target distribution and improves on root-node i.i.d rollouts. Further, we develop a dynamic neural selector that estimates the expected block efficiency of optimal-transport-based verification methods from draft and target features, enabling context-dependent expansion decisions. Our neural selector allows OT-based methods like SpecInfer to outperform Traversal Verification for the first time, achieving 5% higher average throughput across a wide range of models, datasets, and sampling settings.

Metadata

arXiv ID: 2602.16994
Provider: ARXIV
Primary Category: cs.LG
Published: 2026-02-19
Fetched: 2026-02-21 18:51

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.16994v1</id>\n    <title>Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding</title>\n    <updated>2026-02-19T01:41:58Z</updated>\n    <link href='https://arxiv.org/abs/2602.16994v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.16994v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Multi-path speculative decoding accelerates lossless sampling from a target model by using a cheaper draft model to generate a draft tree of tokens, and then applies a verification algorithm that accepts a subset of these. While prior work has proposed various verification algorithms for i.i.d rollouts, their relative performance under matched settings remains unclear. In this work, we firstly present a systematic evaluation of verification strategies across model families, tasks, and sampling regimes, and find that Traversal Verification dominates consistently, with OT-based methods lagging far behind. Our analysis uncovers that this occurs because OT-based methods achieve high multi-token acceptance near the root of the draft tree, while multi-token gains are most impactful deeper in the draft tree, where draft and target distributions diverge. Based on this insight, we propose delayed tree expansion, which drafts a partial single path, delaying the i.i.d. branching point. We show that delayed tree expansion preserves the target distribution and improves on root-node i.i.d rollouts. Further, we develop a dynamic neural selector that estimates the expected block efficiency of optimal-transport-based verification methods from draft and target features, enabling context-dependent expansion decisions. Our neural selector allows OT-based methods like SpecInfer to outperform Traversal Verification for the first time, achieving 5% higher average throughput across a wide range of models, datasets, and sampling settings.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <published>2026-02-19T01:41:58Z</published>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Rahul Thomas</name>\n    </author>\n    <author>\n      <name>Teo Kitanovski</name>\n    </author>\n    <author>\n      <name>Micah Goldblum</name>\n    </author>\n    <author>\n      <name>Arka Pal</name>\n    </author>\n  </entry>"
}