Research

Paper

AI LLM February 24, 2026

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

Authors

Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan

Abstract

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.

Metadata

arXiv ID: 2602.21172
Provider: ARXIV
Primary Category: cs.AI
Published: 2026-02-24
Fetched: 2026-02-25 06:05

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.21172v1</id>\n    <title>NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning</title>\n    <updated>2026-02-24T18:17:21Z</updated>\n    <link href='https://arxiv.org/abs/2602.21172v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.21172v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \\modelname (\\textbf{No} \\textbf{R}easoning for \\textbf{D}riving). Compared to existing VLAs, \\modelname achieves competitive performance while being fine-tuned on $&lt;$60\\% of the data and no reasoning annotations, resulting in 3$\\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \\modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \\modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-02-24T18:17:21Z</published>\n    <arxiv:comment>Accepted to CVPR 2026</arxiv:comment>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Ishaan Rawal</name>\n    </author>\n    <author>\n      <name>Shubh Gupta</name>\n    </author>\n    <author>\n      <name>Yihan Hu</name>\n    </author>\n    <author>\n      <name>Wei Zhan</name>\n    </author>\n  </entry>"
}