Research

Paper

TESTING March 04, 2026

HDLFORGE: A Two-Stage Multi-Agent Framework for Efficient Verilog Code Generation with Adaptive Model Escalation

Authors

Armin Abdollahi, Saeid Shokoufa, Negin Ashrafi, Mehdi Kamal, Massoud Pedram

Abstract

We present HDLFORGE, a two-stage multi-agent framework for automated Verilog generation that optimizes the trade-off between generation speed and accuracy. The system uses a compact coder with a medium-sized LLM by default (Stage A) and escalates to a stronger coder with an ultra-large LLM (Stage B) only when needed, guided by a calibrated score from inexpensive diagnostics including compilation, lint, and smoke tests. A key innovation is a counterexample-guided formal agent that converts bounded-model-checking traces into reusable micro-tests, significantly reducing bug detection time and repair iterations. The portable escalation controller can wrap existing Verilog LLM pipelines without modifying their internals. Evaluated on VerilogEval Human, VerilogEval V2, and RTLLM benchmarks, HDLFORGE demonstrates improved accuracy-latency trade-offs compared to single-stage systems through comprehensive analysis of wall-clock time distributions, escalation thresholds, and agent ablations. On VerilogEval Human and VerilogEval V2, HDLFORGE-Qwen achieves 91.2% and 91.8% Pass@1 with roughly 50% lower median latency, dramatically improving accuracy over other medium-sized models, and 97.2% Pass@5 on RTLLM.

Metadata

arXiv ID: 2603.04646

Provider: ARXIV

Primary Category: cs.AR

Published: 2026-03-04

Fetched: 2026-03-06 14:20

Related papers

Cosmic Shear in Effective Field Theory at Two-Loop Order: Revisiting $S_8$ in Dark Energy Survey Data

Shi-Fan Chen, Joseph DeRose, Mikhail M. Ivanov, Oliver H. E. Philcox • 2026-03-30

Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation

Vitória Barin Pacela, Shruti Joshi, Isabela Camacho, Simon Lacoste-Julien, Da... • 2026-03-30

SNID-SAGE: A Modern Framework for Interactive Supernova Classification and Spectral Analysis

Fiorenzo Stoppa, Stephen J. Smartt • 2026-03-30

Acoustic-to-articulatory Inversion of the Complete Vocal Tract from RT-MRI with Various Audio Embeddings and Dataset Sizes

Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie • 2026-03-30

Rotating black hole shadows in metric-affine bumblebee gravity

Jose R. Nascimento, Ana R. M. Oliveira, Albert Yu. Petrov, Paulo J. Porfírio,... • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.04646v1</id>\n    <title>HDLFORGE: A Two-Stage Multi-Agent Framework for Efficient Verilog Code Generation with Adaptive Model Escalation</title>\n    <updated>2026-03-04T22:18:36Z</updated>\n    <link href='https://arxiv.org/abs/2603.04646v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.04646v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>We present HDLFORGE, a two-stage multi-agent framework for automated Verilog generation that optimizes the trade-off between generation speed and accuracy. The system uses a compact coder with a medium-sized LLM by default (Stage A) and escalates to a stronger coder with an ultra-large LLM (Stage B) only when needed, guided by a calibrated score from inexpensive diagnostics including compilation, lint, and smoke tests. A key innovation is a counterexample-guided formal agent that converts bounded-model-checking traces into reusable micro-tests, significantly reducing bug detection time and repair iterations. The portable escalation controller can wrap existing Verilog LLM pipelines without modifying their internals. Evaluated on VerilogEval Human, VerilogEval V2, and RTLLM benchmarks, HDLFORGE demonstrates improved accuracy-latency trade-offs compared to single-stage systems through comprehensive analysis of wall-clock time distributions, escalation thresholds, and agent ablations. On VerilogEval Human and VerilogEval V2, HDLFORGE-Qwen achieves 91.2% and 91.8% Pass@1 with roughly 50% lower median latency, dramatically improving accuracy over other medium-sized models, and 97.2% Pass@5 on RTLLM.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AR'/>\n    <published>2026-03-04T22:18:36Z</published>\n    <arxiv:primary_category term='cs.AR'/>\n    <author>\n      <name>Armin Abdollahi</name>\n    </author>\n    <author>\n      <name>Saeid Shokoufa</name>\n    </author>\n    <author>\n      <name>Negin Ashrafi</name>\n    </author>\n    <author>\n      <name>Mehdi Kamal</name>\n    </author>\n    <author>\n      <name>Massoud Pedram</name>\n    </author>\n  </entry>"
}