Papers
Research papers from arXiv and related sources
Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services
The rapid adoption of large language models (LLMs) in financial services introduces new operational, regulatory, and security risks. Yet most red-teaming benchmarks remain domain-agnostic and fail ...
Fabrizio Dimino, Bhaskarjit Sarmah, Stefano Pasquali
AI-Enhanced Spatial Cellular Traffic Demand Prediction with Contextual Clustering and Error Correction for 5G/6G Planning
Accurate spatial prediction of cellular traffic demand is essential for 5G NR capacity planning, network densification, and data-driven 6G planning. Although machine learning can fuse heterogeneous...
Mohamad Alkadamani, Colin Brown, Halim Yanikomeroglu
Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?
EVMbench, released by OpenAI, Paradigm, and OtterSec, is the first large-scale benchmark for AI agents on smart contract security. Its results -- agents detect up to 45.6% of vulnerabilities and ex...
Chaoyuan Peng, Lei Wu, Yajin Zhou
Interpretable Chinese Metaphor Identification via LLM-Assisted MIPVU Rule Script Generation: A Comparative Protocol Study
Metaphor identification is a foundational task in figurative language processing, yet most computational approaches operate as opaque classifiers offering no insight into why an expression is judge...
Weihang Huang, Mengna Liu
Guiding Diffusion Models with Semantically Degraded Conditions
Classifier-Free Guidance (CFG) is a cornerstone of modern text-to-image models, yet its reliance on a semantically vacuous null prompt ($\varnothing$) generates a guidance signal prone to geometric...
Shilong Han, Yuming Zhang, Hongxia Wang
A Control-Theoretic Foundation for Agentic Systems
This paper develops a control-theoretic framework for analyzing agentic systems embedded within feedback control loops. In such systems, an AI agent may adapt controller parameters, select among co...
Ali Eslami, Jiangbo Yu
Large Language Models as Annotators for Machine Translation Quality Estimation
Large Language Models (LLMs) have demonstrated excellent performance on Machine Translation Quality Estimation (MTQE), yet their high inference costs make them impractical for direct application. I...
Sidi Wang, Sophie Arnoult, Amir Kamran
AI-Generated Rubric Interfaces: K-12 Teachers' Perceptions and Practices
This study investigates K--12 teachers' perceptions and experiences with AI-supported rubric generation during a summer professional development workshop ($n = 25$). Teachers used MagicSchool.ai to...
Bahare Riahi, Sayali Patukale, Joy Niranjan, Yogya Koneru, Tiffany Barnes, Veronica Cateté
Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness
Large language models (LLMs) trained with canonical tokenization exhibit surprising robustness to non-canonical inputs such as character-level tokenization, yet the mechanisms underlying this robus...
Zhipeng Yang, Shu Yang, Lijie Hu, Di Wang
RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems
We present the design and implementation of a RAG-based AI system benchmarking (RAGPerf) framework for characterizing the system behaviors of RAG pipelines. To facilitate detailed profiling and fin...
Shaobo Li, Yirui Zhou, Yuan Xu, Kevin Chen, Daniel Waddington, Swaminathan Sundararaman, Hubertus...
Prioritizing Gradient Sign Over Modulus: An Importance-Aware Framework for Wireless Federated Learning
Wireless federated learning (FL) facilitates collaborative training of artificial intelligence (AI) models to support ubiquitous intelligent applications at the wireless edge. However, the inherent...
Yiyang Yue, Jiacheng Yao, Wei Xu, Zhaohui Yang, George K. Karagiannidis, Dusit Niyato
CodePercept: Code-Grounded Visual STEM Perception for MLLMs
When MLLMs fail at Science, Technology, Engineering, and Mathematics (STEM) visual reasoning, a fundamental question arises: is it due to perceptual deficiencies or reasoning limitations? Through s...
Tongkun Guan, Zhibo Yang, Jianqiang Wan, Mingkun Yang, Zhengtao Guo, Zijian Hu, Ruilin Luo, Ruize...
AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations
LLM agents are highly vulnerable to Indirect Prompt Injection (IPI), where adversaries embed malicious directives in untrusted tool outputs to hijack execution. Most existing defenses treat IPI as ...
Yu He, Haozhe Zhu, Yiming Li, Shuo Shao, Hongwei Yao, Zhihao Liu, Zhan Qin
Pneuma-Seeker: A Relational Reification Mechanism to Align AI Agents with Human Work over Relational Data
When faced with data problems, many data workers cannot articulate their information need precisely enough for software to help. Although LLMs interpret natural-language requests, they behave britt...
Muhammad Imam Luthfi Balaka, John Hillesland, Kemal Badur, Raul Castro Fernandez
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
Accurate estimation of uncertainty in deep learning is critical for deploying models in high-stakes domains such as medical diagnosis and autonomous decision-making, where overconfident predictions...
Xinran Xu, Xiuyi Fan
CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems
Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (...
Panagiotis Georgios Pennas, Konstantinos Papaioannou, Marco Guarnieri, Thaleia Dimitra Doudali
Believing vs. Achieving -- The Disconnect between Efficacy Beliefs and Collaborative Outcomes
As artificial intelligence (AI) becomes increasingly integrated into workflows, humans must decide when to rely on AI advice. These decisions depend on general efficacy beliefs, i.e., humans' confi...
Philipp Spitzer, Joshua Holstein
Prism-$Δ$: Differential Subspace Steering for Prompt Highlighting in Large Language Models
Prompt highlighting steers a large language model to prioritize user-specified text spans during generation. A key challenge is extracting steering directions that capture the difference between re...
Yuyao Ge, Shenghua Liu, Yiwei Wang, Tianyu Liu, Baolong Bi, Lingrui Mei, Jiayu Yao, Jiafeng Guo, ...
Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval
Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we inves...
Andrea Volpini, Elie Raad, Beatrice Gamba, David Riccitelli
EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution
Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evo...
Tianshu Zhang, Kun Qian, Siddhartha Sahai, Yuan Tian, Shaddy Garg, Huan Sun, Yunyao Li