Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge, but its reliance on potentially poisonable knowledge bases introd...

Junchen Li, Chao Qi, Rongzheng Wang, Qizhi Chen, Liang Xu, Di Liang, Bob Simons, Shuang Liang

2603.03919 2026-03-04
TESTING

Automated Testbed for Repeatable Evaluation of Ultra-Wideband Localization Performance

Testing Ultra-Wideband (UWB) systems is challenging, as multiple devices need to coordinate over lossy links and the systems' behavior is influenced by timing, synchronization, and environmental fa...

Alexander Kemptner, Julian Karoliny, Hannah Brunner, Andreas Gaich, Michael Neubauer, Fjolla Adem...

2603.03918 2026-03-04
AI LLM

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

Large language models (LLMs) have demonstrated significant potential in developing Role-Playing Agents (RPAs). However, current research primarily evaluates RPAs using famous fictional characters, ...

Ji-Lun Peng, Yun-Nung Chen

2603.03915 2026-03-04
TESTING

Fast proton transport and neutron production in proton therapy using Fourier neural operators

Objective: Real-time adaptive proton range verification systems based on produced neutrons require accurate information on their non-isotropic momentum distributions within short times, for which M...

Francesco Blangiardi, Hunter N. Ratliff, Fabian Teichert, Kristian Smeland Ytre-Hauge, Jan Langer...

2603.03912 2026-03-04
AI LLM

From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures

Web security demands rapid response capabilities to evolving cyber threats. Agentic Artificial Intelligence (AI) promises automation, but the need for trustworthy security responses is of the utmos...

Chiara Bonfanti, Davide Colaiacomo, Luca Cagliero, Cataldo Basile

2603.03911 2026-03-04
TESTING

Asymptotic sharpness of a Nikolskii type inequality for rational functions in the Wiener algebra

We establish the asymptotic sharpness of a Nikolskii type inequality proved by A. Baranov and R. Zarouf for rational functions $f$ in the Wiener algebra of absolutely convergent Fourier series, wit...

Benjamin Auxemery, Alexander Borichev, Rachid Zarouf

2603.03908 2026-03-04
AI LLM

Measuring Privacy vs. Fidelity in Synthetic Social Media Datasets

Synthetic data is increasingly used to support research without exposing sensitive user content. Social media data is one of the types of datasets that would hugely benefit from representative synt...

Henry Tari, Adriana Iamnitchi

2603.03906 2026-03-04
AI LLM

From Misclassifications to Outliers: Joint Reliability Assessment in Classification

Building reliable classifiers is a fundamental challenge for deploying machine learning in real-world applications. A reliable system should not only detect out-of-distribution (OOD) inputs but als...

Yang Li, Youyang Sha, Yinzhi Wang, Timothy Hospedales, Xi Shen, Shell Xu Hu, Xuanlong Yu

2603.03903 2026-03-04
AI LLM

IROSA: Interactive Robot Skill Adaptation using Natural Language

Foundation models have demonstrated impressive capabilities across diverse domains, while imitation learning provides principled methods for robot skill adaptation from limited data. Combining thes...

Markus Knauer, Samuel Bustamante, Thomas Eiband, Alin Albu-Schäffer, Freek Stulp, João Silvério

2603.03897 2026-03-04
AI LLM

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

Topic localization aims to identify spans of text that express a given topic defined by a name and description. To study this task, we introduce a human-annotated benchmark based on Czech historica...

Martin Kostelník, Michal Hradiš, Martin Dočekal

2603.03884 2026-03-04
AI LLM

On the Suitability of LLM-Driven Agents for Dark Pattern Audits

As LLM-driven agents begin to autonomously navigate the web, their ability to interpret and respond to manipulative interface design becomes critical. A fundamental question that emerges is: can su...

Chen Sun, Yash Vekaria, Rishab Nithyanand

2603.03881 2026-03-04
AI LLM

CarbonPATH: Carbon-aware pathfinding and architecture optimization for chiplet-based AI systems

The exponential growth of AI has created unprecedented demand for computational resources, pushing chip designs to the limit while simultaneously escalating the environmental footprint of computing...

Chetan Choppali Sudarshan, Jiajun Hu, Aman Arora, Vidya A. Chhabria

2603.03878 2026-03-04
TESTING

Believe Your Model: Distribution-Guided Confidence Calibration

Large Reasoning Models have demonstrated remarkable performance with the advancement of test-time scaling techniques, which enhances prediction accuracy by generating multiple candidate responses a...

Xizhong Yang, Haotian Zhang, Huiming Wang, Mofei Song

2603.03872 2026-03-04
TESTING

Ising Models of Cooperativity in Muscle Contraction

Regulation of contraction in striated muscle is controlled by a dual mechanism involving both thin filaments containing actin and thick filaments containing myosin. The thin filament is activated b...

Elaheh Saadat, Matthieu Caruel, Stefano Gherardini, Ilaria Morotti, Matteo Marcello, Marco Carema...

2603.03866 2026-03-04
AI LLM

Assessing the Effectiveness of LLMs in Delivering Cognitive Behavioral Therapy

As mental health issues continue to rise globally, there is an increasing demand for accessible and scalable therapeutic solutions. Many individuals currently seek support from Large Language Model...

Navdeep Singh Bedi, Ana-Maria Bucur, Noriko Kando, Fabio Crestani

2603.03862 2026-03-04
TESTING

A Robust Compressible APIC/FLIP Particle Grid Method with Conservative Resampling and Adaptive APIC/PIC Blending

Modeling inviscid compressible flows with shocks and vortex dominated dynamics remains challenging for particle grid methods due to moving discontinuities, cell crossing noise, and quadrature degra...

Jiansheng Yao, Yingkui Zhao

2603.03860 2026-03-04
AI LLM

A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs

Audio LLMs have shown a strong ability to understand audio samples, yet their reliability in complex acoustic scenes remains under-explored. Unlike prior work limited to small scale or less control...

Taehan Lee, Jaehan Jung, Hyukjun Lee

2603.03855 2026-03-04
AI LLM

Benchmarking Motivational Interviewing Competence of Large Language Models

Motivational interviewing (MI) promotes behavioural change in substance use disorders. Its fidelity is measured using the Motivational Interviewing Treatment Integrity (MITI) framework. While large...

Aishwariya Jha, Prakrithi Shivaprakash, Lekhansh Shukla, Animesh Mukherjee, Prabhat Chand, Pratim...

2603.03846 2026-03-04
TESTING

Semantic Bridging Domains: Pseudo-Source as Test-Time Connector

Distribution shifts between training and testing data are a critical bottleneck limiting the practical utility of models, especially in real-world test-time scenarios. To adapt models when the sour...

Xizhong Yang, Huiming Wang, Ning Xu, Mofei Song

2603.03844 2026-03-04
AI LLM

All-in-One Image Restoration via Causal-Deconfounding Wavelet-Disentangled Prompt Network

Image restoration represents a promising approach for addressing the inherent defects of image content distortion. Standard image restoration approaches suffer from high storage cost and the requir...

Bingnan Wang, Bin Qin, Jiangmeng Li, Fanjiang Xu, Fuchun Sun, Hui Xiong

2603.03839 2026-03-04