Papers
Research papers from arXiv and related sources
TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health
While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the do...
Zixin Xiong, Ziteng Wang, Haotian Fan, Xinjie Zhang, Wenxuan Wang
IoUCert: Robustness Verification for Anchor-based Object Detectors
While formal robustness verification has seen significant success in image classification, scaling these guarantees to object detection remains notoriously difficult due to complex non-linear coord...
Benedikt Brückner, Alejandro Mercado, Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio
Zigzag Persistence of Neural Responses to Time-Varying Stimuli
We use topological data analysis to study neural population activity in the Sensorium 2023 dataset, which records responses from thousands of mouse visual cortex neurons to diverse video stimuli. F...
Yuri Gardinazzi, Alessio Ansuini, Eugenio Piasini, Fabio Anselmi, Matteo Biagetti
Step-Level Sparse Autoencoder for Reasoning Process Interpretation
Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While...
Xuan Yang, Jiayu Liu, Yuhang Lai, Hao Xu, Zhenya Huang, Ning Miao
MA-CoNav: A Master-Slave Multi-Agent Framework with Hierarchical Collaboration and Dual-Level Reflection for Long-Horizon Embodied VLN
Vision-Language Navigation (VLN) aims to empower robots with the ability to perform long-horizon navigation in unfamiliar environments based on complex linguistic instructions. Its success critical...
Ling Luo, Qianqian Bai
Dynamic Contract Analysis for Parallel Programming Models
Parallel programming in high-performance computing depends on low-level APIs such as MPI, requiring users to manage synchronization and resources manually. Several correctness checking tools exist ...
Yussur Mustafa Oraji, Alexander Hück, Christian Bischof
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
Enterprise engineering organizations produce high-volume, heterogeneous telemetry from version control systems, CI/CD pipelines, issue trackers, and observability platforms. Large Language Models (...
Yuvraj Agrawal
Reproducing and Comparing Distillation Techniques for Cross-Encoders
Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right...
Victor Morand, Mathias Vast, Basile Van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski
OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents
Multi-agent large language model frameworks are promising for complex multi step reasoning, yet existing systems remain weak for scientific and knowledge intensive domains due to static prompts and...
Yichao Feng, Haoran Luo, Zhenghong Lin, Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh, Anh Tuan Luu
Why Does RLAIF Work At All?
Reinforcement Learning from AI Feedback (RLAIF) enables language models to improve by training on their own preference judgments, yet no theoretical account explains why this self-improvement seemi...
Robin Young
Magnetic monopoles and high frequency gravitational waves from quasi-stable strings
The spontaneous breaking of $SO(10)$ via flipped $SU(5)$ to the Standard Model yields a novel scenario in which the superheavy topologically stable GUT monopole carrying a single unit ($2π/e$) of D...
Rinku Maji, Qaisar Shafi
Kaon leptonic and semileptonic decays with $N_f=2+1+1$ HISQ fermions
Precision tests of the Standard Model (SM) currently show a deficit in first-row Cabibbo-Kobayashi-Maskawa (CKM) unitarity. In this talk, we discuss progress towards a correlated analysis of the la...
Ramón Merino, Alexei Bazavov, Claude W. Bernard, Carleton DeTar, Aida X. El-Khadra, Elvira Gámiz,...
Contextualized Privacy Defense for LLM Agents
LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, s...
Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, Diyi Yang
Spatial Autoregressive Modeling of DINOv3 Embeddings for Unsupervised Anomaly Detection
DINO models provide rich patch-level representations that have recently enabled strong performance in unsupervised anomaly detection (UAD). Most existing methods extract patch embeddings from ``nor...
Ertunc Erdil, Nico Schulthess, Guney Tombak, Ender Konukoglu
TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation
Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch: VLMs are primarily pretrained on static, disembodi...
Jiaxing Liu, Zexi Zhang, Xiaoyan Li, Boyue Wang, Yongli Hu, Baocai Yin
Delegation and Verification Under AI
As AI systems enter institutional workflows, workers must decide whether to delegate task execution to AI and how much effort to invest in verifying AI outputs, while institutions evaluate workers ...
Lingxiao Huang, Wenyang Xiao, Nisheeth K. Vishnoi
Architecting Trust in Artificial Epistemic Agents
Large language models increasingly function as epistemic agents -- entities that can 1) autonomously pursue epistemic goals and 2) actively shape our shared knowledge environment. They curate the i...
Nahema Marchal, Stephanie Chan, Matija Franklin, Manon Revel, Geoff Keeling, Roberta Fischli, Bil...
Changing Pedagogical Paradigms: Integrating Generative AI in Mathematics to Enhance Digital Literacy through 'Mathematical Battles with AI'
This paper introduces `Math Battles with AI', an innovative competitive format designed at ITMO University to redefine the role of generative AI in mathematics education. Moving away from a purely ...
Maria Moskalenko, Alexander Trifanov, Roman Popkov, Arina Tabieva, Maria Smirnova, Konstantin Pra...
Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in single-cell foundation models: a comparative atlas of Geneformer and scGPT
Background: Single-cell foundation models such as Geneformer and scGPT encode rich biological information, but whether this includes causal regulatory logic rather than statistical co-expression re...
Ihor Kendiukhov
The Geometry of Learning Under AI Delegation
As AI systems shift from tools to collaborators, a central question is how the skills of humans relying on them change over time. We study this question mathematically by modeling the joint evoluti...
Lingxiao Huang, Nisheeth K. Vishnoi