Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
TESTING

Uncertainty equality for SU(N) observables enabling the experimentally friendly detection of k-inseparability via purity measurements

We derive an exact uncertainty relation for arbitrary quantum states of finite-dimensional Hilbert spaces. For any given $k$-partition of a $d$-dimensional multipartite system, we introduce the tot...

G. Tartaglione, G. Zanfardino, F. Illuminati

2603.17844 2026-03-18
TESTING

The Revised Evolutionary Volume Tolman Test: Cosmological Constraints from Galaxy Evolution

In this study we adapt a classical cosmology measurement, the volume or number density test, to a modern synthesis of observed galaxy evolution. We do this by using measured galaxy mass functions a...

Christopher J. Conselice, Edmund J. Copeland, Sergio Sevillano Muñoz

2603.17842 2026-03-18
AI LLM

How do LLMs Compute Verbal Confidence

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate...

Dharshan Kumaran, Arthur Conmy, Federico Barbero, Simon Osindero, Viorica Patraucean, Petar Velic...

2603.17839 2026-03-18
AI LLM

Event-Centric Human Value Understanding in News-Domain Texts: An Actor-Conditioned, Multi-Granularity Benchmark

Existing human value datasets do not directly support value understanding in factual news: many are actor-agnostic, rely on isolated utterances or synthetic scenarios, and lack explicit event struc...

Yao Wang, Xin Liu, Zhuochen Liu, Jiankang Chen, Adam Jatowt, Kyoungsook Kim, Noriko Kando, Haitao Yu

2603.17838 2026-03-18
TESTING

Verification and Validation of Physics-Informed Surrogate Component Models for Dynamic Power-System Simulation

Physics-informed machine learning surrogates are increasingly explored to accelerate dynamic simulation of generators, converters, and other power grid components. The key question, however, is not...

Petros Ellinas, Indrajit Chaudhuri, Johanna Vorwerk, Spyros Chatzivasileiadis

2603.17836 2026-03-18
TESTING

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

Diffusion models and flow matching have become a cornerstone of robotic imitation learning, yet they suffer from a structural inefficiency where inference is often bound to a fixed integration sche...

Zunzhe Zhang, Runhan Huang, Yicheng Liu, Shaoting Zhu, Linzhan Mou, Hang Zhao

2603.17834 2026-03-18
AI LLM

ArchBench: Benchmarking Generative-AI for Software Architecture Tasks

Benchmarks for large language models (LLMs) have progressed from snippet-level function generation to repository-level issue resolution, yet they overwhelmingly target implementation correctness. S...

Bassam Adnan, Aviral Gupta, Sreemaee Akshathala, Karthik Vaidhyanathan

2603.17833 2026-03-18
AI LLM

Text-to-Stage: Spatial Layouts from Long-form Narratives

In this work, we probe the ability of a language model to demonstrate spatial reasoning from unstructured text, mimicking human capabilities and automating a process that benefits many downstream m...

Jefferson Hernandez, Swarnadeep Saha, Chenxi Whitehouse, Sanjeel Parekh, Calvin Murdock, Yuliang ...

2603.17832 2026-03-18
AI LLM

RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy

LLM agents often fail in closed-world embodied environments because actions must satisfy strict preconditions -- such as location, inventory, and container states -- and failure feedback is sparse....

Zhenhang Yuan, Shenghai Yuan, Lihua Xie

2603.17831 2026-03-18
AI LLM

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

A prerequisite for coding agents to perform tasks on large repositories is code localization - the identification of relevant files, classes, and functions to work on. While repository-level code l...

Lintang Sutawika, Aditya Bharat Soni, Bharath Sriraam R R, Apurva Gandhi, Taha Yassine, Sanidhya ...

2603.17829 2026-03-18
AI LLM

FailureMem: A Failure-Aware Multimodal Framework for Autonomous Software Repair

Multimodal Automated Program Repair (MAPR) extends traditional program repair by requiring models to jointly reason over source code, textual issue descriptions, and visual artifacts such as GUI sc...

Ruize Ma, Yilei Jiang, Shilin Zhang, Zheng Ma, Yi Feng, Vincent Ng, Zhi Wang, Xiangyu Yue, Chuany...

2603.17826 2026-03-18
AI LLM

Discovering Decoupled Functional Modules in Large Language Models

Understanding the internal functional organization of Large Language Models (LLMs) is crucial for improving their trustworthiness and performance. However, how LLMs organize different functions int...

Yanke Yu, Jin Li, Ying Sun, Ping Li, Zhefeng Wang, Yi Zheng

2603.17823 2026-03-18
AI LLM

CodeT5-RNN: Reinforcing Contextual Embeddings for Enhanced Code Comprehension

Contextual embeddings generated by LLMs exhibit strong positional inductive biases, which can limit their ability to fully capture long-range, order-sensitive dependencies in highly structured sour...

Md Mostafizer Rahman, Ariful Islam Shiplu, Yutaka Watanobe, Md Faizul Ibne Amin, Syed Rameez Naqv...

2603.17821 2026-03-18
AI LLM

Process Supervision for Chain-of-Thought Reasoning via Monte Carlo Net Information Gain

Multi-step reasoning improves the capabilities of large language models (LLMs) but increases the risk of errors propagating through intermediate steps. Process reward models (PRMs) mitigate this by...

Corentin Royer, Debarun Bhattacharjya, Gaetano Rossiello, Andrea Giovannini, Mennatallah El-Assady

2603.17815 2026-03-18
TESTING

M2P: Improving Visual Foundation Models with Mask-to-Point Weakly-Supervised Learning for Dense Point Tracking

Tracking Any Point (TAP) has emerged as a fundamental tool for video understanding. Current approaches adapt Vision Foundation Models (VFMs) like DINOv2 via offline finetuning or test-time optimiza...

Qiangqiang Wu, Tianyu Yang, Bo Fang, Jia Wan, Matias Di Martino, Guillermo Sapiro, Antoni B. Chan

2603.17813 2026-03-18
AI LLM

Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDs

The key-value (KV) cache has become the dominant contributor to memory consumption in large language model (LLM) inference. Although offloading KVCache from GPU high-bandwidth memory (HBM) to CPU D...

Tuowei Wang, Liyun Chu, Ruwen Fan, Ju Ren

2603.17803 2026-03-18
TESTING

Simulating the influence of stoichiometry on the spectral emissivity of Mo$_x$Si$_y$ thin films

In this work, we simulate the spectral emissivity of various stoichiometric crystal phases of Mo$_x$Si$_y$ compounds using density functional perturbation theory. The dielectric function, including...

Zahra Golsanamlou, Arseniy Baskakov, Robbert van de Kruijs, Silvester Houweling, Giorgio Colombi,...

2603.17801 2026-03-18
TESTING

Multivariate Residual Estimation Risk

The purpose of this paper is to describe and extend the use of the newly-introduced measure, residual estimation risk. Following the seminal work of Bignozzi and Tsanakas, the quantification of res...

D. J. Manuge

2603.17792 2026-03-18
TESTING

Algorithms for the Generation of Snarks

The essential requirement for a cubic graph to be called a snark is that it can not be edge-coloured with three colours. To avoid trivial cases, varying restrictions on the connectivity are impos...

Gunnar Brinkmann, Steven Van Overberghe

2603.17789 2026-03-18
AI LLM

Governed Memory: A Production Architecture for Multi-Agent Workflows

Enterprise AI deploys dozens of autonomous agent nodes across workflows, each acting on the same entities with no shared memory and no common governance. We identify five structural challenges aris...

Hamed Taheri

2603.17787 2026-03-18