Research

Paper

AI LLM March 10, 2026

GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models

Authors

Andrew Murray, Danial Dervovic, Alberto Pozanco, Michael Cashmore

Abstract

We present GenePlan (GENeralized Evolutionary Planner), a novel framework that leverages large language model (LLM) assisted evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. By casting generalized planning as an optimization problem, GenePlan iteratively evolves interpretable Python planners that minimize plan length across diverse problem instances. In empirical evaluation across six existing benchmark domains and two new domains, GenePlan achieved an average SAT score of 0.91, closely matching the performance of the state-of-the-art planners (SAT score 0.93), and significantly outperforming other LLM-based baselines such as chain-of-thought (CoT) prompting (average SAT score 0.64). The generated planners solve new instances rapidly (average 0.49 seconds per task) and at low cost (average $1.82 per domain using GPT-4o).

Metadata

arXiv ID: 2603.09481
Provider: ARXIV
Primary Category: cs.AI
Published: 2026-03-10
Fetched: 2026-03-11 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.09481v1</id>\n    <title>GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models</title>\n    <updated>2026-03-10T10:32:05Z</updated>\n    <link href='https://arxiv.org/abs/2603.09481v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.09481v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>We present GenePlan (GENeralized Evolutionary Planner), a novel framework that leverages large language model (LLM) assisted evolutionary algorithms to generate domain-dependent generalized planners for classical planning tasks described in PDDL. By casting generalized planning as an optimization problem, GenePlan iteratively evolves interpretable Python planners that minimize plan length across diverse problem instances. In empirical evaluation across six existing benchmark domains and two new domains, GenePlan achieved an average SAT score of 0.91, closely matching the performance of the state-of-the-art planners (SAT score 0.93), and significantly outperforming other LLM-based baselines such as chain-of-thought (CoT) prompting (average SAT score 0.64). The generated planners solve new instances rapidly (average 0.49 seconds per task) and at low cost (average $1.82 per domain using GPT-4o).</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-03-10T10:32:05Z</published>\n    <arxiv:comment>54 pages, 4 figures. Accepted to ICAPS 2026</arxiv:comment>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Andrew Murray</name>\n    </author>\n    <author>\n      <name>Danial Dervovic</name>\n    </author>\n    <author>\n      <name>Alberto Pozanco</name>\n    </author>\n    <author>\n      <name>Michael Cashmore</name>\n    </author>\n  </entry>"
}