Research

Paper

AI LLM March 11, 2026

Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

Authors

Zhengyang Shan, Jiayun Xin, Yue Zhang, Minghui Xu

Abstract

Code agents powered by large language models can execute shell commands on behalf of users, introducing severe security vulnerabilities. This paper presents a two-phase security analysis of the OpenClaw platform. As an open-source AI agent framework that operates locally, OpenClaw can be integrated with various commercial large language models. Because its native architecture lacks built-in security constraints, it serves as an ideal subject for evaluating baseline agent vulnerabilities. First, we systematically evaluate OpenClaw's native resilience against malicious instructions. By testing 47 adversarial scenarios across six major attack categories derived from the MITRE ATLAS and ATT\&CK frameworks, we have demonstrated that OpenClaw exhibits significant inherent security issues. It primarily relies on the security capabilities of the backend LLM and is highly susceptible to sandbox escape attacks, with an average defense rate of only 17\%. To mitigate these critical security gaps, we propose and implement a novel Human-in-the-Loop (HITL) defense layer. We utilize a dual-mode testing framework to evaluate the system with and without our proposed intervention. Our findings show that the introduced HITL layer significantly hardens the system, successfully intercepting up to 8 severe attacks that completely bypassed OpenClaw's native defenses. By combining native capabilities with our HITL approach, the overall defense rate improves to a range of 19\% to 92\%. Our study not only exposes the intrinsic limitations of current code agents but also demonstrates the effectiveness of human-agent collaborative defense strategies.

Metadata

arXiv ID: 2603.10387

Provider: ARXIV

Primary Category: cs.CR

Published: 2026-03-11

Fetched: 2026-03-12 04:21

Related papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.10387v1</id>\n    <title>Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw</title>\n    <updated>2026-03-11T04:09:05Z</updated>\n    <link href='https://arxiv.org/abs/2603.10387v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.10387v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Code agents powered by large language models can execute shell commands on behalf of users, introducing severe security vulnerabilities. This paper presents a two-phase security analysis of the OpenClaw platform. As an open-source AI agent framework that operates locally, OpenClaw can be integrated with various commercial large language models. Because its native architecture lacks built-in security constraints, it serves as an ideal subject for evaluating baseline agent vulnerabilities. First, we systematically evaluate OpenClaw's native resilience against malicious instructions. By testing 47 adversarial scenarios across six major attack categories derived from the MITRE ATLAS and ATT\\&amp;CK frameworks, we have demonstrated that OpenClaw exhibits significant inherent security issues. It primarily relies on the security capabilities of the backend LLM and is highly susceptible to sandbox escape attacks, with an average defense rate of only 17\\%. To mitigate these critical security gaps, we propose and implement a novel Human-in-the-Loop (HITL) defense layer. We utilize a dual-mode testing framework to evaluate the system with and without our proposed intervention. Our findings show that the introduced HITL layer significantly hardens the system, successfully intercepting up to 8 severe attacks that completely bypassed OpenClaw's native defenses. By combining native capabilities with our HITL approach, the overall defense rate improves to a range of 19\\% to 92\\%. Our study not only exposes the intrinsic limitations of current code agents but also demonstrates the effectiveness of human-agent collaborative defense strategies.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CR'/>\n    <published>2026-03-11T04:09:05Z</published>\n    <arxiv:comment>12 pages, 2 figures, 4 tables</arxiv:comment>\n    <arxiv:primary_category term='cs.CR'/>\n    <author>\n      <name>Zhengyang Shan</name>\n    </author>\n    <author>\n      <name>Jiayun Xin</name>\n    </author>\n    <author>\n      <name>Yue Zhang</name>\n    </author>\n    <author>\n      <name>Minghui Xu</name>\n    </author>\n  </entry>"
}