Research

Paper

AI LLM March 23, 2026

AI In Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes

Authors

Haoran Xi, Minghao Shao, Kimberly Milner, Venkata Sai Charan Putrevu, Nanda Rani, Meet Udeshi, Prashanth Krishnamurthy, Brendan Dolan-Gavitt, Siddharth Garg, Sandeep Kumar Shukla, Farshad Khorrami, Alon Hillel-Tuch, Muhammad Shafique, Ramesh Karri

Abstract

Large language models are rapidly changing how learners acquire and demonstrate cybersecurity skills. However, when human--AI collaboration is allowed, educators still lack validated competition designs and evaluation practices that remain fair and evidence-based. This paper presents a cross-regional study of LLM-centered Capture-the-Flag competitions built on the Cyber Security Awareness Week competition system. To understand how autonomy levels and participants' knowledge backgrounds influence problem-solving performance and learning-related behaviors, we formalize three autonomy levels: human-in-the-loop, autonomous agent frameworks, and hybrid. To enable verification, we require traceable submissions including conversation logs, agent trajectories, and agent code. We analyze multi-region competition data covering an in-class track, a standard track, and a year-long expert track, each targeting participants with different knowledge backgrounds. Using data from the 2025 competition, we compare solve performance across autonomy levels and challenge categories, and observe that autonomous agent frameworks and hybrid achieve higher completion rates on challenges requiring iterative testing and tool interactions. In the in-class track, we classify participants' agent designs and find a preference for lightweight, tool-augmented prompting and reflection-based retries over complex multi-agent architectures. Our results offer actionable guidance for designing LLM-assisted cybersecurity competitions as learning technologies, including autonomy-specific scoring criteria, evidence requirements that support solution verification, and track structures that improve accessibility while preserving reliable evaluation and engagement.

Metadata

arXiv ID: 2603.21551

Provider: ARXIV

Primary Category: cs.SE

Published: 2026-03-23

Fetched: 2026-03-24 06:02

Related papers

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya • 2026-03-25

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.21551v1</id>\n    <title>AI In Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes</title>\n    <updated>2026-03-23T04:05:59Z</updated>\n    <link href='https://arxiv.org/abs/2603.21551v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.21551v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large language models are rapidly changing how learners acquire and demonstrate cybersecurity skills. However, when human--AI collaboration is allowed, educators still lack validated competition designs and evaluation practices that remain fair and evidence-based. This paper presents a cross-regional study of LLM-centered Capture-the-Flag competitions built on the Cyber Security Awareness Week competition system. To understand how autonomy levels and participants' knowledge backgrounds influence problem-solving performance and learning-related behaviors, we formalize three autonomy levels: human-in-the-loop, autonomous agent frameworks, and hybrid. To enable verification, we require traceable submissions including conversation logs, agent trajectories, and agent code. We analyze multi-region competition data covering an in-class track, a standard track, and a year-long expert track, each targeting participants with different knowledge backgrounds. Using data from the 2025 competition, we compare solve performance across autonomy levels and challenge categories, and observe that autonomous agent frameworks and hybrid achieve higher completion rates on challenges requiring iterative testing and tool interactions. In the in-class track, we classify participants' agent designs and find a preference for lightweight, tool-augmented prompting and reflection-based retries over complex multi-agent architectures. Our results offer actionable guidance for designing LLM-assisted cybersecurity competitions as learning technologies, including autonomy-specific scoring criteria, evidence requirements that support solution verification, and track structures that improve accessibility while preserving reliable evaluation and engagement.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SE'/>\n    <published>2026-03-23T04:05:59Z</published>\n    <arxiv:primary_category term='cs.SE'/>\n    <author>\n      <name>Haoran Xi</name>\n    </author>\n    <author>\n      <name>Minghao Shao</name>\n    </author>\n    <author>\n      <name>Kimberly Milner</name>\n    </author>\n    <author>\n      <name>Venkata Sai Charan Putrevu</name>\n    </author>\n    <author>\n      <name>Nanda Rani</name>\n    </author>\n    <author>\n      <name>Meet Udeshi</name>\n    </author>\n    <author>\n      <name>Prashanth Krishnamurthy</name>\n    </author>\n    <author>\n      <name>Brendan Dolan-Gavitt</name>\n    </author>\n    <author>\n      <name>Siddharth Garg</name>\n    </author>\n    <author>\n      <name>Sandeep Kumar Shukla</name>\n    </author>\n    <author>\n      <name>Farshad Khorrami</name>\n    </author>\n    <author>\n      <name>Alon Hillel-Tuch</name>\n    </author>\n    <author>\n      <name>Muhammad Shafique</name>\n    </author>\n    <author>\n      <name>Ramesh Karri</name>\n    </author>\n  </entry>"
}