Research

Paper

TESTING March 12, 2026

AutoVeriFix+: High-Correctness RTL Generation via Trace-Aware Causal Fix and Semantic Redundancy Pruning

Authors

Yan Tan, Xiangchen Meng, Zijun Jiang, Yangdi Lyu

Abstract

Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as Python and C++. However, their application to hardware description languages, such as Verilog, is challenging due to the scarcity of high-quality training data. Current approaches to Verilog code generation using LLMs often focus on syntactic correctness, resulting in code with functional errors. To address these challenges, we propose AutoVeriFix+, a novel three-stage framework that integrates high-level semantic reasoning with state-space exploration to enhance functional correctness and design efficiency. In the first stage, an LLM is employed to generate high-level Python reference models that define the intended circuit behavior. In the second stage, another LLM generates initial Verilog RTL candidates and iteratively fixes syntactic errors. In the third stage, we introduce a Concolic testing engine to exercise deep sequential logic and identify corner-case vulnerabilities. With cycle-accurate execution traces and internal register snapshots, AutoVeriFix+ provides the LLM with the causal context necessary to resolve complex state-transition errors. Furthermore, it will generate a coverage report to identify functionally redundant branches, enabling the LLM to perform semantic pruning for area optimization. Experimental results demonstrate that AutoVeriFix+ achieves over 80% functional correctness on rigorous benchmarks, reaching a pass@10 score of 90.2% on the VerilogEval-machine dataset. In addition, it eliminates an average of 25% redundant logic across benchmarks through trace-aware optimization.

Metadata

arXiv ID: 2603.11489
Provider: ARXIV
Primary Category: cs.PL
Published: 2026-03-12
Fetched: 2026-03-13 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.11489v1</id>\n    <title>AutoVeriFix+: High-Correctness RTL Generation via Trace-Aware Causal Fix and Semantic Redundancy Pruning</title>\n    <updated>2026-03-12T03:15:57Z</updated>\n    <link href='https://arxiv.org/abs/2603.11489v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.11489v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as Python and C++. However, their application to hardware description languages, such as Verilog, is challenging due to the scarcity of high-quality training data. Current approaches to Verilog code generation using LLMs often focus on syntactic correctness, resulting in code with functional errors. To address these challenges, we propose AutoVeriFix+, a novel three-stage framework that integrates high-level semantic reasoning with state-space exploration to enhance functional correctness and design efficiency. In the first stage, an LLM is employed to generate high-level Python reference models that define the intended circuit behavior. In the second stage, another LLM generates initial Verilog RTL candidates and iteratively fixes syntactic errors. In the third stage, we introduce a Concolic testing engine to exercise deep sequential logic and identify corner-case vulnerabilities. With cycle-accurate execution traces and internal register snapshots, AutoVeriFix+ provides the LLM with the causal context necessary to resolve complex state-transition errors. Furthermore, it will generate a coverage report to identify functionally redundant branches, enabling the LLM to perform semantic pruning for area optimization. Experimental results demonstrate that AutoVeriFix+ achieves over 80% functional correctness on rigorous benchmarks, reaching a pass@10 score of 90.2% on the VerilogEval-machine dataset. In addition, it eliminates an average of 25% redundant logic across benchmarks through trace-aware optimization.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.PL'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AR'/>\n    <published>2026-03-12T03:15:57Z</published>\n    <arxiv:comment>arXiv admin note: text overlap with arXiv:2509.08416</arxiv:comment>\n    <arxiv:primary_category term='cs.PL'/>\n    <author>\n      <name>Yan Tan</name>\n    </author>\n    <author>\n      <name>Xiangchen Meng</name>\n    </author>\n    <author>\n      <name>Zijun Jiang</name>\n    </author>\n    <author>\n      <name>Yangdi Lyu</name>\n    </author>\n  </entry>"
}