Research

Paper

TESTING March 24, 2026

TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration

Authors

Chunxiao Li, Lijun Li, Jing Shao

Abstract

The rapid advancement of Vision-Language Models (VLMs) has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them to optimizing within a predefined strategy set and preventing the discovery of novel, diverse exploits. To transcend this limitation, we introduce TreeTeaming, an automated red teaming framework that reframes strategy exploration from static testing to a dynamic, evolutionary discovery process. At its core lies a strategic Orchestrator, powered by a Large Language Model (LLM), which autonomously decides whether to evolve promising attack paths or explore diverse strategic branches, thereby dynamically constructing and expanding a strategy tree. A multimodal actuator is then tasked with executing these complex strategies. In the experiments across 12 prominent VLMs, TreeTeaming achieves state-of-the-art attack success rates on 11 models, outperforming existing methods and reaching up to 87.60\% on GPT-4o. The framework also demonstrates superior strategic diversity over the union of previously public jailbreak strategies. Furthermore, the generated attacks exhibit an average toxicity reduction of 23.09\%, showcasing their stealth and subtlety. Our work introduces a new paradigm for automated vulnerability discovery, underscoring the necessity of proactive exploration beyond static heuristics to secure frontier AI models.

Metadata

arXiv ID: 2603.22882

Provider: ARXIV

Primary Category: cs.LG

Published: 2026-03-24

Fetched: 2026-03-25 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.22882v1</id>\n    <title>TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration</title>\n    <updated>2026-03-24T07:31:42Z</updated>\n    <link href='https://arxiv.org/abs/2603.22882v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.22882v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The rapid advancement of Vision-Language Models (VLMs) has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them to optimizing within a predefined strategy set and preventing the discovery of novel, diverse exploits. To transcend this limitation, we introduce TreeTeaming, an automated red teaming framework that reframes strategy exploration from static testing to a dynamic, evolutionary discovery process. At its core lies a strategic Orchestrator, powered by a Large Language Model (LLM), which autonomously decides whether to evolve promising attack paths or explore diverse strategic branches, thereby dynamically constructing and expanding a strategy tree. A multimodal actuator is then tasked with executing these complex strategies. In the experiments across 12 prominent VLMs, TreeTeaming achieves state-of-the-art attack success rates on 11 models, outperforming existing methods and reaching up to 87.60\\% on GPT-4o. The framework also demonstrates superior strategic diversity over the union of previously public jailbreak strategies. Furthermore, the generated attacks exhibit an average toxicity reduction of 23.09\\%, showcasing their stealth and subtlety. Our work introduces a new paradigm for automated vulnerability discovery, underscoring the necessity of proactive exploration beyond static heuristics to secure frontier AI models.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-03-24T07:31:42Z</published>\n    <arxiv:comment>CVPR2026</arxiv:comment>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Chunxiao Li</name>\n    </author>\n    <author>\n      <name>Lijun Li</name>\n    </author>\n    <author>\n      <name>Jing Shao</name>\n    </author>\n  </entry>"
}