Papers
Research papers from arXiv and related sources
Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services
The rapid adoption of large language models (LLMs) in financial services introduces new operational, regulatory, and security risks. Yet most red-teaming benchmarks remain domain-agnostic and fail ...
Fabrizio Dimino, Bhaskarjit Sarmah, Stefano Pasquali
Backdoor Directions in Vision Transformers
This paper investigates how Backdoor Attacks are represented within Vision Transformers (ViTs). By assuming knowledge of the trigger, we identify a specific ``trigger direction'' in the model's act...
Sengim Karayalcin, Marina Krcek, Pin-Yu Chen, Stjepan Picek
AI-Enhanced Spatial Cellular Traffic Demand Prediction with Contextual Clustering and Error Correction for 5G/6G Planning
Accurate spatial prediction of cellular traffic demand is essential for 5G NR capacity planning, network densification, and data-driven 6G planning. Although machine learning can fuse heterogeneous...
Mohamad Alkadamani, Colin Brown, Halim Yanikomeroglu
Quantum Limits of Passive Optical Surface Metrology and Defect Detection
We develop a quantum statistical framework for passive optical surface metrology. Modelling a surface as an incoherent ensemble of point emitters imaged through a diffraction-limited system, we emp...
Jernej Frank, George Brumpton, Tommaso Tufarelli, Gerardo Adesso, Samanta Piano
Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?
EVMbench, released by OpenAI, Paradigm, and OtterSec, is the first large-scale benchmark for AI agents on smart contract security. Its results -- agents detect up to 45.6% of vulnerabilities and ex...
Chaoyuan Peng, Lei Wu, Yajin Zhou
Interpretable Chinese Metaphor Identification via LLM-Assisted MIPVU Rule Script Generation: A Comparative Protocol Study
Metaphor identification is a foundational task in figurative language processing, yet most computational approaches operate as opaque classifiers offering no insight into why an expression is judge...
Weihang Huang, Mengna Liu
Guiding Diffusion Models with Semantically Degraded Conditions
Classifier-Free Guidance (CFG) is a cornerstone of modern text-to-image models, yet its reliance on a semantically vacuous null prompt ($\varnothing$) generates a guidance signal prone to geometric...
Shilong Han, Yuming Zhang, Hongxia Wang
A Control-Theoretic Foundation for Agentic Systems
This paper develops a control-theoretic framework for analyzing agentic systems embedded within feedback control loops. In such systems, an AI agent may adapt controller parameters, select among co...
Ali Eslami, Jiangbo Yu
Large Language Models as Annotators for Machine Translation Quality Estimation
Large Language Models (LLMs) have demonstrated excellent performance on Machine Translation Quality Estimation (MTQE), yet their high inference costs make them impractical for direct application. I...
Sidi Wang, Sophie Arnoult, Amir Kamran
AI-Generated Rubric Interfaces: K-12 Teachers' Perceptions and Practices
This study investigates K--12 teachers' perceptions and experiences with AI-supported rubric generation during a summer professional development workshop ($n = 25$). Teachers used MagicSchool.ai to...
Bahare Riahi, Sayali Patukale, Joy Niranjan, Yogya Koneru, Tiffany Barnes, Veronica Cateté
Multiple change-point detection on the circle via isolation using permutation testing
In this paper we propose a new method for multiple change-point detection for piecewise-constant circular signals, a setting that, despite its importance in many scientific domains, remains compara...
Sophia Loizidou, Andreas Anastasiou, Christophe Ley
Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness
Large language models (LLMs) trained with canonical tokenization exhibit surprising robustness to non-canonical inputs such as character-level tokenization, yet the mechanisms underlying this robus...
Zhipeng Yang, Shu Yang, Lijie Hu, Di Wang
RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems
We present the design and implementation of a RAG-based AI system benchmarking (RAGPerf) framework for characterizing the system behaviors of RAG pipelines. To facilitate detailed profiling and fin...
Shaobo Li, Yirui Zhou, Yuan Xu, Kevin Chen, Daniel Waddington, Swaminathan Sundararaman, Hubertus...
Prioritizing Gradient Sign Over Modulus: An Importance-Aware Framework for Wireless Federated Learning
Wireless federated learning (FL) facilitates collaborative training of artificial intelligence (AI) models to support ubiquitous intelligent applications at the wireless edge. However, the inherent...
Yiyang Yue, Jiacheng Yao, Wei Xu, Zhaohui Yang, George K. Karagiannidis, Dusit Niyato
CodePercept: Code-Grounded Visual STEM Perception for MLLMs
When MLLMs fail at Science, Technology, Engineering, and Mathematics (STEM) visual reasoning, a fundamental question arises: is it due to perceptual deficiencies or reasoning limitations? Through s...
Tongkun Guan, Zhibo Yang, Jianqiang Wan, Mingkun Yang, Zhengtao Guo, Zijian Hu, Ruilin Luo, Ruize...
AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations
LLM agents are highly vulnerable to Indirect Prompt Injection (IPI), where adversaries embed malicious directives in untrusted tool outputs to hijack execution. Most existing defenses treat IPI as ...
Yu He, Haozhe Zhu, Yiming Li, Shuo Shao, Hongwei Yao, Zhihao Liu, Zhan Qin
Pneuma-Seeker: A Relational Reification Mechanism to Align AI Agents with Human Work over Relational Data
When faced with data problems, many data workers cannot articulate their information need precisely enough for software to help. Although LLMs interpret natural-language requests, they behave britt...
Muhammad Imam Luthfi Balaka, John Hillesland, Kemal Badur, Raul Castro Fernandez
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
Accurate estimation of uncertainty in deep learning is critical for deploying models in high-stakes domains such as medical diagnosis and autonomous decision-making, where overconfident predictions...
Xinran Xu, Xiuyi Fan
A Grammar of Machine Learning Workflows
Data leakage affected 294 published papers across 17 scientific fields (Kapoor & Narayanan, 2023). The dominant response has been documentation: checklists, linters, best-practice guides. Documenta...
Simon Roth
Zero crossings of the differential scalar polarizability of Ba$^+$ clock transition
The differential scalar polarizability $Δα_0(ω)$ of the Ba$^+$ S$_{1/2}$-to-D$_{5/2}$ clock transition has a zero crossing near 481nm, which is measured to be 623.603\,13(17)\,THz. From this measur...
N Jayjong, M D K Lee, K J Arnold, M D Barrett