Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

Field-angle dependence of magnetoresistance in UTe2

We theoretically study angle-resolved magnetoresistance under rotated magnetic field in the normal state of a spin-triplet superconductor UTe$_2$. The Wannier model derived from a GGA+$U$ calculati...

Jun Ishizuka, Youichi Yanase

2603.17235 2026-03-18
TESTING

Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning

Auto-formalization (AF) translates natural-language reasoning problems into solver-executable programs, enabling symbolic solvers to perform sound logical deduction. In practice, however, AF pipeli...

Zhiyu Ni, Zheng Liang, Liangcheng Song, Chenrui Cao, Xian Zhang, Alberto Sangiovanni-Vincentelli,...

2603.17233 2026-03-18
TESTING

From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs

Multimodal Large Language Models (MLLMs) are increasingly applied to pixel-level vision tasks, yet their intrinsic capacity for spatial understanding remains poorly understood. We investigate segme...

Boyong Wu, Sanghwan Kim, Zeynep Akata

2603.17228 2026-03-18
TESTING

Alignment Makes Language Models Normative, Not Descriptive

Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pair...

Eilam Shapira, Moshe Tennenholtz, Roi Reichart

2603.17218 2026-03-17
TESTING

Talk is Cheap, Logic is Hard: Benchmarking LLMs on Post-Condition Formalization

Formal specifications, such as pre- and post-conditions provide a solid basis for performing thorough program verification. However, developers rarely provide such formal specifications, hence if A...

I. S. W. B. Prasetya, Fitsum Kifetew, Davide Prandi

2603.17193 2026-03-17
TESTING

Influence of Gripper Design on Human Demonstration Quality for Robot Learning

Opening sterile medical packaging is routine for healthcare workers but remains challenging for robots. Learning from demonstration enables robots to acquire manipulation skills directly from human...

Gina L. Georgadarellis, Natalija Beslic, Seonhun Lee, Frank C. Sup, Meghan E. Huber

2603.17189 2026-03-17
TESTING

Reconstructing the Type Ia Supernova Absolute Magnitude with Two-Probe Physics-Informed Neural Networks

We apply two variants of Physics-Informed Neural Networks (PINNs) to reconstruct the Type Ia supernova absolute magnitude $M_B(z)$ from joint BAO and supernova data under four cosmological models (...

Denitsa Staicova

2603.17184 2026-03-17
TESTING

Generalist Multimodal LLMs Gain Biometric Expertise via Human Salience

Iris presentation attack detection (PAD) is critical for secure biometric deployments, yet developing specialized models faces significant practical barriers: collecting data representing future un...

Jacob Piland, Byron Dowling, Christopher Sweet, Adam Czajka

2603.17173 2026-03-17
TESTING

Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges

Large language models (LLMs) are increasingly used as automated judges and synthetic labelers, especially in low-label settings. Yet these systems are stochastic and often overconfident, which make...

Maxim Khomiakov, Jes Frellsen

2603.17172 2026-03-17
TESTING

PAuth - Precise Task-Scoped Authorization For Agents

The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are mis...

Reshabh K Sharma, Linxi Jiang, Zhiqiang Lin, Shuo Chen

2603.17170 2026-03-17
TESTING

Quadratic Surrogate Attractor for Particle Swarm Optimization

This paper presents a particle swarm optimization algorithm that leverages surrogate modeling to replace the conventional global best solution with the minimum of an n-dimensional quadratic form, p...

Maurizio Clemente, Marcello Canova

2603.17163 2026-03-17
TESTING

Energy Flow Graph: Modeling Software Energy Consumption

The growing energy demands of computational systems necessitate a fundamental shift from performance-centric design to one that treats energy consumption as one of the primary design considerations...

Saurabhsingh Rajput, Tushar Sharma

2603.17162 2026-03-17
TESTING

Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents

Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal nat...

Shuvendu K. Lahiri

2603.17150 2026-03-17
TESTING

Multilingual Reference Need Assessment System for Wikipedia

Wikipedia is a critical source of information for millions of users across the Web. It serves as a key resource for large language models, search engines, question-answering systems, and other Web-...

Aitolkyn Baigutanova, Francisco Navas, Pablo Aragon, Mykola Trokhymovych, Muniza Aslam, Ai-Jou Ch...

2603.17146 2026-03-17
TESTING

SLSim: a strong lensing population simulation package

Gravitational lensing offers unique insights into cosmology by bending light around massive objects. Strong gravitational lensing, in particular, produces magnified and often multiple images of dis...

Narayan Khadka, Simon Birrer, Henry Best, Paras Sharma, Katsuya T. Abe, Xianzhe Tang, Carly Misti...

2603.17138 2026-03-17
TESTING

A Longitudinal Study of Usability in Identity-Based Software Signing

Identity-based software signing tools aim to make software artifact provenance verifiable while reducing the operational burden of long-lived key management. However, there is limited cross-tool lo...

Kelechi G. Kalu, Hieu Tran, Santiago Torres-Arias, Sooyeon Jeong, James C. Davis

2603.17133 2026-03-17
TESTING

Upward Book Embeddings of Partitioned Digraphs

In 1999, Heath, Pemmaraju, and Trenk [SIAM J. Comput. 28(4), 1999] extended the classic notion of book embeddings to digraphs, introducing the concept of upward book embeddings, in which the vertic...

Giordano Da Lozzo, Fabrizio Frati, Ignaz Rutter

2603.17128 2026-03-17
TESTING

Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles

Ensembling Vision-Language Models (VLMs) from different providers maximizes benchmark accuracy, yet models from the same architectural family share correlated errors that standard voting ignores. W...

Zacharie Bugaud

2603.17111 2026-03-17
TESTING

On Big-M Reformulations of Bilevel Linear Programs: Hardness of A Posteriori Verification

A standard approach to solving optimistic bilevel linear programs (BLPs) is to replace the lower-level problem with its Karush-Kuhn-Tucker (KKT) optimality conditions and reformulate the resulting ...

Sergey S. Ketkov, Oleg A. Prokopyev

2603.17107 2026-03-17
TESTING

How Proxy Race Distorts Regression-Based Fairness Audits

Proxy-based race inference is increasingly used to conduct fairness assessments when protected-class data are unavailable or legally restricted -- most prominently in U.S. fair-lending enforcement,...

Xi Xin, Giles Hooker, Fei Huang

2603.17106 2026-03-17