Research

Paper

AI LLM March 25, 2026

LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control

Authors

Yifeng Zhang, Peizhuo Li, Tingguang Zhou, Mingfeng Fan, Guillaume Sartoretti

Abstract

Adaptive Traffic Signal Control (ATSC) aims to optimize traffic flow and minimize delays by adjusting traffic lights in real time. Recent advances in Multi-agent Reinforcement Learning (MARL) have shown promise for ATSC, yet existing approaches still suffer from limited representational capacity, often leading to suboptimal performance and poor generalization in complex and dynamic traffic environments. On the other hand, Large Language Models (LLMs) excel at semantic representation, reasoning, and analysis, yet their propensity for hallucination and slow inference speeds often hinder their direct application to decision-making tasks. To address these challenges, we propose a novel learning paradigm named LATS that integrates LLMs and MARL, leveraging the former's strong prior knowledge and inductive abilities to enhance the latter's decision-making process. Specifically, we introduce a plug-and-play teacher-student learning module, where a trained embedding LLM serves as a teacher to generate rich semantic features that capture each intersection's topology structures and traffic dynamics. A much simpler (student) neural network then learns to emulate these features through knowledge distillation in the latent space, enabling the final model to operate independently from the LLM for downstream use in the RL decision-making process. This integration significantly enhances the overall model's representational capacity across diverse traffic scenarios, thus leading to more efficient and generalizable control strategies. Extensive experiments across diverse traffic datasets empirically demonstrate that our method enhances the representation learning capability of RL models, thereby leading to improved overall performance and generalization over both traditional RL and LLM-only approaches. [...]

Metadata

arXiv ID: 2603.24361
Provider: ARXIV
Primary Category: cs.RO
Published: 2026-03-25
Fetched: 2026-03-26 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.24361v1</id>\n    <title>LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control</title>\n    <updated>2026-03-25T14:42:32Z</updated>\n    <link href='https://arxiv.org/abs/2603.24361v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.24361v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Adaptive Traffic Signal Control (ATSC) aims to optimize traffic flow and minimize delays by adjusting traffic lights in real time. Recent advances in Multi-agent Reinforcement Learning (MARL) have shown promise for ATSC, yet existing approaches still suffer from limited representational capacity, often leading to suboptimal performance and poor generalization in complex and dynamic traffic environments. On the other hand, Large Language Models (LLMs) excel at semantic representation, reasoning, and analysis, yet their propensity for hallucination and slow inference speeds often hinder their direct application to decision-making tasks. To address these challenges, we propose a novel learning paradigm named LATS that integrates LLMs and MARL, leveraging the former's strong prior knowledge and inductive abilities to enhance the latter's decision-making process. Specifically, we introduce a plug-and-play teacher-student learning module, where a trained embedding LLM serves as a teacher to generate rich semantic features that capture each intersection's topology structures and traffic dynamics. A much simpler (student) neural network then learns to emulate these features through knowledge distillation in the latent space, enabling the final model to operate independently from the LLM for downstream use in the RL decision-making process. This integration significantly enhances the overall model's representational capacity across diverse traffic scenarios, thus leading to more efficient and generalizable control strategies. Extensive experiments across diverse traffic datasets empirically demonstrate that our method enhances the representation learning capability of RL models, thereby leading to improved overall performance and generalization over both traditional RL and LLM-only approaches. [...]</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.RO'/>\n    <published>2026-03-25T14:42:32Z</published>\n    <arxiv:primary_category term='cs.RO'/>\n    <author>\n      <name>Yifeng Zhang</name>\n    </author>\n    <author>\n      <name>Peizhuo Li</name>\n    </author>\n    <author>\n      <name>Tingguang Zhou</name>\n    </author>\n    <author>\n      <name>Mingfeng Fan</name>\n    </author>\n    <author>\n      <name>Guillaume Sartoretti</name>\n    </author>\n  </entry>"
}