Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Reachability-based Temporal Logic Verification for Reliable LLM-guided Human-Autonomy Teaming

We propose a reachability-based framework for reliable LLM-guided human-autonomy teaming (HAT) using signal temporal logic (STL). In the proposed framework, LLM is leveraged as a translator that tr...

Joonwon Choi, Kartik Anand Pant, Karthik Nune, Inseok Hwang

2603.08633 2026-03-09
AI LLM

Coverage-Guided Multi-Agent Harness Generation for Java Library Fuzzing

Coverage-guided fuzzing has proven effective for software testing, but targeting library code requires specialized fuzz harnesses that translate fuzzer-generated inputs into valid API invocations. ...

Nils Loose, Nico Winkel, Kristoffer Hempel, Felix Mächtle, Julian Hans, Thomas Eisenbarth

2603.08616 2026-03-09
AI LLM

What to Make Sense of in the Era of LLM? A Perspective from the Structure and Efforts in Sensemaking

Sensemaking tasks often entail navigating through complex, ambiguous data to construct coherent insights. Prior work has shown that crowds can effectively distribute cognitive load, pooling diverse...

Tianyi Li, Satya Samhita Bonepalli, Vikram Mohanty

2603.08604 2026-03-09
AI LLM

Trust via Reputation of Conviction

The question of \emph{knowledge}, \emph{truth} and \emph{trust} is explored via a mathematical formulation of claims and sources. We define truth as the reproducibly perceived subset of knowledge, ...

Aravind R. Iyengar

2603.08575 2026-03-09
AI LLM

OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security

DARPA's AI Cyber Challenge (AIxCC) showed that cyber reasoning systems (CRSs) can go beyond vulnerability discovery to autonomously confirm and patch bugs: seven teams built such systems and open-s...

Andrew Chin, Dongkwan Kim, Yu-Fu Fu, Fabian Fleischer, Youngjoon Kim, HyungSeok Han, Cen Zhang, B...

2603.08566 2026-03-09
AI LLM

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Large language model (LLM)-based agents trained with reinforcement learning (RL) have shown strong potential on complex interactive tasks. However, standard RL paradigms favor static problem-solvin...

Xiaoying Zhang, Zichen Liu, Yipeng Zhang, Xia Hu, Wenqi Shao

2603.08561 2026-03-09
AI LLM

SCAFFOLD-CEGIS: Preventing Latent Security Degradation in LLM-Driven Iterative Code Refinement

The application of large language models to code generation has evolved from one-shot generation to iterative refinement, yet the evolution of security throughout iteration remains insufficiently u...

Yi Chen, Yun Bian, Haiquan Wang, Shihao Li, Zhe Cui

2603.08520 2026-03-09
AI LLM

Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

Large language models (LLMs) can answer religious knowledge queries fluently, yet they often hallucinate and misattribute sources, which is especially consequential in Islamic settings where users ...

Ummar Abbas, Mourad Ouzzani, Mohamed Y. Eltabakh, Omar Sinan, Gagan Bhatia, Hamdy Mubarak, Majd H...

2603.08501 2026-03-09
AI LLM

Towards Modeling Cybersecurity Behavior of Humans in Organizations

We undertake a comprehensive and structured synthesis of the drivers of human behavior in cybersecurity, focusing specifically on people within organizations (i.e., especially employees in companie...

Klaas Ole Kürtz

2603.08484 2026-03-09
AI LLM

Behavioral Generative Agents for Power Dispatch and Auction

This paper presents positive initial evidence that generative agents can relax the rigidity of traditional mathematical models for human decision-making in power dispatch and auction settings. We d...

Shaoze Li, Justin S. Kim, Cong Chen

2603.08477 2026-03-09
AI LLM

R2F: Repurposing Ray Frontiers for LLM-free Object Navigation

Zero-shot open-vocabulary object navigation has progressed rapidly with the emergence of large Vision-Language Models (VLMs) and Large Language Models (LLMs), now widely used as high-level decision...

Francesco Argenziano, John Mark Alexis Marcelo, Michele Brienza, Abdel Hakim Drid, Emanuele Musum...

2603.08475 2026-03-09
AI LLM

Amplitude Analysis of Singly Cabibbo-Suppressed Decay $Λ^{+}_{c}\to p K^{+} K^{-}$

Using a sample of $e^{+}e^{-}$ annihilation data corresponding to an integrated luminosity of 4.4 $\rm{fb}^{-1}$ collected with the BESIII detector at the BEPCII collider and produced at center-of-...

BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso,...

2603.08469 2026-03-09
AI LLM

Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heur...

Fabio Valerio Massoli, Andrey Kuzmin, Arash Behboodi

2603.08462 2026-03-09
AI LLM

Data-Driven Priors for Uncertainty-Aware Deterioration Risk Prediction with Multimodal Data

Safe predictions are a crucial requirement for integrating predictive models into clinical decision support systems. One approach for ensuring trustworthiness is to enable models' ability to expres...

L. Julián Lechuga López, Tim G. J. Rudner, Farah E. Shamout

2603.08459 2026-03-09
AI LLM

LycheeCluster: Efficient Long-Context Inference with Structure-Aware Chunking and Hierarchical KV Indexing

The quadratic complexity of the attention mechanism and the substantial memory footprint of the Key-Value (KV) cache present severe computational and memory challenges for Large Language Models (LL...

Dongfang Li, Zixuan Liu, Gang Lin, Baotian Hu, Min Zhang

2603.08453 2026-03-09
AI LLM

A Dataset for Probing Translationese Preferences in English-to-Swedish Translation

Translations often carry traces of the source language, a phenomenon known as translationese. We introduce the first freely available English-to-Swedish dataset contrasting translationese sentences...

Jenny Kunz, Anja Jarochenko, Marcel Bollmann

2603.08450 2026-03-09
AI LLM

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating these systems into clinical practice req...

Peter Brodeur, Jacob M. Koshy, Anil Palepu, Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ry...

2603.08448 2026-03-09
AI LLM

LLM-Driven Online Aggregation for Unstructured Text Analytics

Large Language Models (LLMs) exhibit strong capabilities in text processing, and recent research has augmented SQL and DataFrame with LLM-powered semantic operators for data analysis. However, LLM-...

Chao Hui, Weizheng Lu, Yanjie Gao, Lingfeng Xiong, Yunhai Wang, Yueguo Chen

2603.08443 2026-03-09
AI LLM

One Model Is Enough: Native Retrieval Embeddings from LLM Agent Hidden States

LLM agents that retrieve external knowledge typically generate a search query as text, then run a separate embedding model to encode it into a vector. This two-model pipeline adds infrastructure co...

Bo Jiang

2603.08429 2026-03-09
AI LLM

IronEngine: Towards General AI Assistant

This paper presents IronEngine, a general AI assistant platform organized around a unified orchestration core that connects a desktop user interface, REST and WebSocket APIs, Python clients, local ...

Xi Mo

2603.08425 2026-03-09