Research

Paper

AI LLM March 18, 2026

Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning

Authors

Jingchun Yang, Jinchang Zhang

Abstract

The widespread adoption of dashcams has made video evidence in traffic accidents increasingly abundant, yet transforming "what happened in the video" into "who is responsible under which legal provisions" still relies heavily on human experts. Existing ego-view traffic accident studies mainly focus on perception and semantic understanding, while LLM-based legal methods are mostly built on textual case descriptions and rarely incorporate video evidence, leaving a clear gap between the two. We first propose C-TRAIL, a multimodal legal dataset that, under the Chinese traffic regulation system, explicitly aligns dashcam videos and textual descriptions with a closed set of responsibility modes and their corresponding Chinese traffic statutes. On this basis, we introduce a two-stage framework: (1) a traffic accident understanding module that generates textual video descriptions; and (2) a legal multi-agent framework that outputs responsibility modes, statute sets, and complete judgment reports. Experimental results on C-TRAIL and MM-AU show that our method outperforms general and legal LLMs, as well as existing agent-based approaches, while providing a transparent and interpretable legal reasoning process.

Metadata

arXiv ID: 2603.17930
Provider: ARXIV
Primary Category: cs.CV
Published: 2026-03-18
Fetched: 2026-03-19 06:01

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.17930v1</id>\n    <title>Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning</title>\n    <updated>2026-03-18T17:04:48Z</updated>\n    <link href='https://arxiv.org/abs/2603.17930v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.17930v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The widespread adoption of dashcams has made video evidence in traffic accidents increasingly abundant, yet transforming \"what happened in the video\" into \"who is responsible under which legal provisions\" still relies heavily on human experts. Existing ego-view traffic accident studies mainly focus on perception and semantic understanding, while LLM-based legal methods are mostly built on textual case descriptions and rarely incorporate video evidence, leaving a clear gap between the two. We first propose C-TRAIL, a multimodal legal dataset that, under the Chinese traffic regulation system, explicitly aligns dashcam videos and textual descriptions with a closed set of responsibility modes and their corresponding Chinese traffic statutes. On this basis, we introduce a two-stage framework: (1) a traffic accident understanding module that generates textual video descriptions; and (2) a legal multi-agent framework that outputs responsibility modes, statute sets, and complete judgment reports. Experimental results on C-TRAIL and MM-AU show that our method outperforms general and legal LLMs, as well as existing agent-based approaches, while providing a transparent and interpretable legal reasoning process.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-03-18T17:04:48Z</published>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Jingchun Yang</name>\n    </author>\n    <author>\n      <name>Jinchang Zhang</name>\n    </author>\n  </entry>"
}