Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the do...

Zixin Xiong, Ziteng Wang, Haotian Fan, Xinjie Zhang, Wenxuan Wang

2603.03047 2026-03-03
TESTING

IoUCert: Robustness Verification for Anchor-based Object Detectors

While formal robustness verification has seen significant success in image classification, scaling these guarantees to object detection remains notoriously difficult due to complex non-linear coord...

Benedikt Brückner, Alejandro Mercado, Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio

2603.03043 2026-03-03
TESTING

Zigzag Persistence of Neural Responses to Time-Varying Stimuli

We use topological data analysis to study neural population activity in the Sensorium 2023 dataset, which records responses from thousands of mouse visual cortex neurons to diverse video stimuli. F...

Yuri Gardinazzi, Alessio Ansuini, Eugenio Piasini, Fabio Anselmi, Matteo Biagetti

2603.03037 2026-03-03
AI LLM

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While...

Xuan Yang, Jiayu Liu, Yuhang Lai, Hao Xu, Zhenya Huang, Ning Miao

2603.03031 2026-03-03
TESTING

MA-CoNav: A Master-Slave Multi-Agent Framework with Hierarchical Collaboration and Dual-Level Reflection for Long-Horizon Embodied VLN

Vision-Language Navigation (VLN) aims to empower robots with the ability to perform long-horizon navigation in unfamiliar environments based on complex linguistic instructions. Its success critical...

Ling Luo, Qianqian Bai

2603.03024 2026-03-03
TESTING

Dynamic Contract Analysis for Parallel Programming Models

Parallel programming in high-performance computing depends on low-level APIs such as MPI, requiring users to manage synchronization and resources manually. Several correctness checking tools exist ...

Yussur Mustafa Oraji, Alexander Hück, Christian Bischof

2603.03023 2026-03-03
AI LLM

REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

Enterprise engineering organizations produce high-volume, heterogeneous telemetry from version control systems, CI/CD pipelines, issue trackers, and observability platforms. Large Language Models (...

Yuvraj Agrawal

2603.03018 2026-03-03
AI LLM

Reproducing and Comparing Distillation Techniques for Cross-Encoders

Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right...

Victor Morand, Mathias Vast, Basile Van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski

2603.03010 2026-03-03
AI LLM

OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

Multi-agent large language model frameworks are promising for complex multi step reasoning, yet existing systems remain weak for scientific and knowledge intensive domains due to static prompts and...

Yichao Feng, Haoran Luo, Zhenghong Lin, Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh, Anh Tuan Luu

2603.03005 2026-03-03
AI LLM

Why Does RLAIF Work At All?

Reinforcement Learning from AI Feedback (RLAIF) enables language models to improve by training on their own preference judgments, yet no theoretical account explains why this self-improvement seemi...

Robin Young

2603.03000 2026-03-03
TESTING

Magnetic monopoles and high frequency gravitational waves from quasi-stable strings

The spontaneous breaking of $SO(10)$ via flipped $SU(5)$ to the Standard Model yields a novel scenario in which the superheavy topologically stable GUT monopole carrying a single unit ($2π/e$) of D...

Rinku Maji, Qaisar Shafi

2603.02996 2026-03-03
TESTING

Kaon leptonic and semileptonic decays with $N_f=2+1+1$ HISQ fermions

Precision tests of the Standard Model (SM) currently show a deficit in first-row Cabibbo-Kobayashi-Maskawa (CKM) unitarity. In this talk, we discuss progress towards a correlated analysis of the la...

Ramón Merino, Alexei Bazavov, Claude W. Bernard, Carleton DeTar, Aida X. El-Khadra, Elvira Gámiz,...

2603.02994 2026-03-03
AI LLM

Contextualized Privacy Defense for LLM Agents

LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, s...

Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, Diyi Yang

2603.02983 2026-03-03
TESTING

Spatial Autoregressive Modeling of DINOv3 Embeddings for Unsupervised Anomaly Detection

DINO models provide rich patch-level representations that have recently enabled strong performance in unsupervised anomaly detection (UAD). Most existing methods extract patch embeddings from ``nor...

Ertunc Erdil, Nico Schulthess, Guney Tombak, Ender Konukoglu

2603.02974 2026-03-03
AI LLM

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch: VLMs are primarily pretrained on static, disembodi...

Jiaxing Liu, Zexi Zhang, Xiaoyan Li, Boyue Wang, Yongli Hu, Baocai Yin

2603.02972 2026-03-03
AI LLM

Delegation and Verification Under AI

As AI systems enter institutional workflows, workers must decide whether to delegate task execution to AI and how much effort to invest in verifying AI outputs, while institutions evaluate workers ...

Lingxiao Huang, Wenyang Xiao, Nisheeth K. Vishnoi

2603.02961 2026-03-03
AI LLM

Architecting Trust in Artificial Epistemic Agents

Large language models increasingly function as epistemic agents -- entities that can 1) autonomously pursue epistemic goals and 2) actively shape our shared knowledge environment. They curate the i...

Nahema Marchal, Stephanie Chan, Matija Franklin, Manon Revel, Geoff Keeling, Roberta Fischli, Bil...

2603.02960 2026-03-03
AI LLM

Changing Pedagogical Paradigms: Integrating Generative AI in Mathematics to Enhance Digital Literacy through 'Mathematical Battles with AI'

This paper introduces `Math Battles with AI', an innovative competitive format designed at ITMO University to redefine the role of generative AI in mathematics education. Moving away from a purely ...

Maria Moskalenko, Alexander Trifanov, Roman Popkov, Arina Tabieva, Maria Smirnova, Konstantin Pra...

2603.02955 2026-03-03
TESTING

Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in single-cell foundation models: a comparative atlas of Geneformer and scGPT

Background: Single-cell foundation models such as Geneformer and scGPT encode rich biological information, but whether this includes causal regulatory logic rather than statistical co-expression re...

Ihor Kendiukhov

2603.02952 2026-03-03
AI LLM

The Geometry of Learning Under AI Delegation

As AI systems shift from tools to collaborators, a central question is how the skills of humans relying on them change over time. We study this question mathematically by modeling the joint evoluti...

Lingxiao Huang, Nisheeth K. Vishnoi

2603.02950 2026-03-03