Papers
Research papers from arXiv and related sources
Reachability-based Temporal Logic Verification for Reliable LLM-guided Human-Autonomy Teaming
We propose a reachability-based framework for reliable LLM-guided human-autonomy teaming (HAT) using signal temporal logic (STL). In the proposed framework, LLM is leveraged as a translator that tr...
Joonwon Choi, Kartik Anand Pant, Karthik Nune, Inseok Hwang
Coverage-Guided Multi-Agent Harness Generation for Java Library Fuzzing
Coverage-guided fuzzing has proven effective for software testing, but targeting library code requires specialized fuzz harnesses that translate fuzzer-generated inputs into valid API invocations. ...
Nils Loose, Nico Winkel, Kristoffer Hempel, Felix Mächtle, Julian Hans, Thomas Eisenbarth
What to Make Sense of in the Era of LLM? A Perspective from the Structure and Efforts in Sensemaking
Sensemaking tasks often entail navigating through complex, ambiguous data to construct coherent insights. Prior work has shown that crowds can effectively distribute cognitive load, pooling diverse...
Tianyi Li, Satya Samhita Bonepalli, Vikram Mohanty
Trust via Reputation of Conviction
The question of \emph{knowledge}, \emph{truth} and \emph{trust} is explored via a mathematical formulation of claims and sources. We define truth as the reproducibly perceived subset of knowledge, ...
Aravind R. Iyengar
OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security
DARPA's AI Cyber Challenge (AIxCC) showed that cyber reasoning systems (CRSs) can go beyond vulnerability discovery to autonomously confirm and patch bugs: seven teams built such systems and open-s...
Andrew Chin, Dongkwan Kim, Yu-Fu Fu, Fabian Fleischer, Youngjoon Kim, HyungSeok Han, Cen Zhang, B...
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
Large language model (LLM)-based agents trained with reinforcement learning (RL) have shown strong potential on complex interactive tasks. However, standard RL paradigms favor static problem-solvin...
Xiaoying Zhang, Zichen Liu, Yipeng Zhang, Xia Hu, Wenqi Shao
SCAFFOLD-CEGIS: Preventing Latent Security Degradation in LLM-Driven Iterative Code Refinement
The application of large language models to code generation has evolved from one-shot generation to iterative refinement, yet the evolution of security throughout iteration remains insufficiently u...
Yi Chen, Yun Bian, Haiquan Wang, Shihao Li, Zhe Cui
Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA
Large language models (LLMs) can answer religious knowledge queries fluently, yet they often hallucinate and misattribute sources, which is especially consequential in Islamic settings where users ...
Ummar Abbas, Mourad Ouzzani, Mohamed Y. Eltabakh, Omar Sinan, Gagan Bhatia, Hamdy Mubarak, Majd H...
Towards Modeling Cybersecurity Behavior of Humans in Organizations
We undertake a comprehensive and structured synthesis of the drivers of human behavior in cybersecurity, focusing specifically on people within organizations (i.e., especially employees in companie...
Klaas Ole Kürtz
Behavioral Generative Agents for Power Dispatch and Auction
This paper presents positive initial evidence that generative agents can relax the rigidity of traditional mathematical models for human decision-making in power dispatch and auction settings. We d...
Shaoze Li, Justin S. Kim, Cong Chen
R2F: Repurposing Ray Frontiers for LLM-free Object Navigation
Zero-shot open-vocabulary object navigation has progressed rapidly with the emergence of large Vision-Language Models (VLMs) and Large Language Models (LLMs), now widely used as high-level decision...
Francesco Argenziano, John Mark Alexis Marcelo, Michele Brienza, Abdel Hakim Drid, Emanuele Musum...
Amplitude Analysis of Singly Cabibbo-Suppressed Decay $Λ^{+}_{c}\to p K^{+} K^{-}$
Using a sample of $e^{+}e^{-}$ annihilation data corresponding to an integrated luminosity of 4.4 $\rm{fb}^{-1}$ collected with the BESIII detector at the BEPCII collider and produced at center-of-...
BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso,...
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heur...
Fabio Valerio Massoli, Andrey Kuzmin, Arash Behboodi
Data-Driven Priors for Uncertainty-Aware Deterioration Risk Prediction with Multimodal Data
Safe predictions are a crucial requirement for integrating predictive models into clinical decision support systems. One approach for ensuring trustworthiness is to enable models' ability to expres...
L. Julián Lechuga López, Tim G. J. Rudner, Farah E. Shamout
LycheeCluster: Efficient Long-Context Inference with Structure-Aware Chunking and Hierarchical KV Indexing
The quadratic complexity of the attention mechanism and the substantial memory footprint of the Key-Value (KV) cache present severe computational and memory challenges for Large Language Models (LL...
Dongfang Li, Zixuan Liu, Gang Lin, Baotian Hu, Min Zhang
A Dataset for Probing Translationese Preferences in English-to-Swedish Translation
Translations often carry traces of the source language, a phenomenon known as translationese. We introduce the first freely available English-to-Swedish dataset contrasting translationese sentences...
Jenny Kunz, Anja Jarochenko, Marcel Bollmann
A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic
Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating these systems into clinical practice req...
Peter Brodeur, Jacob M. Koshy, Anil Palepu, Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ry...
LLM-Driven Online Aggregation for Unstructured Text Analytics
Large Language Models (LLMs) exhibit strong capabilities in text processing, and recent research has augmented SQL and DataFrame with LLM-powered semantic operators for data analysis. However, LLM-...
Chao Hui, Weizheng Lu, Yanjie Gao, Lingfeng Xiong, Yunhai Wang, Yueguo Chen
One Model Is Enough: Native Retrieval Embeddings from LLM Agent Hidden States
LLM agents that retrieve external knowledge typically generate a search query as text, then run a separate embedding model to encode it into a vector. This two-model pipeline adds infrastructure co...
Bo Jiang
IronEngine: Towards General AI Assistant
This paper presents IronEngine, a general AI assistant platform organized around a unified orchestration core that connects a desktop user interface, REST and WebSocket APIs, Python clients, local ...
Xi Mo