Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
TESTING

Stochastic Optimization and Coupling

We study optimization problems in which a linear functional is maximized over probability measures that are dominated by a given measure according to an integral stochastic order in an arbitrary di...

Frank Yang, Kai Hao Yang

2603.11448 2026-03-12
TESTING

Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution

We present Verified Multi-Agent Orchestration (VMAO), a framework that coordinates specialized LLM-based agents through a verification-driven iterative loop. Given a complex query, our system decom...

Xing Zhang, Yanwei Cui, Guanghui Wang, Qucy Wei Qiu, Ziyuan Li, Fangwei Han, Yajing Huang, Hengzh...

2603.11445 2026-03-12
TESTING

NCCLbpf: Verified, Composable Policy Execution for GPU Collective Communication

NCCL is the de facto standard for collective GPU communication in large-scale distributed training, relying heavily on plugins to customize runtime behavior. However, these plugins execute as unver...

Yusheng Zheng

2603.11438 2026-03-12
TESTING

ZTab: Domain-based Zero-shot Annotation for Table Columns

This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user...

Ehsan Hoseinzade, Ke Wang

2603.11436 2026-03-12
TESTING

Grounding Robot Generalization in Training Data via Retrieval-Augmented VLMs

Recent work on robot manipulation has advanced policy generalization to novel scenarios. However, it is often difficult to characterize how different evaluation settings actually represent generali...

Jensen Gao, Dorsa Sadigh, Sandy Huang, Dhruv Shah

2603.11426 2026-03-12
TESTING

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely un...

Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska

2603.11417 2026-03-12
TESTING

Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

Ramaswamy et al. reported in \textit{Nature Medicine} that ChatGPT Health under-triages 51.6\% of emergencies, concluding that consumer-facing AI triage poses safety risks. However, their evaluatio...

David Fraile Navarro, Farah Magrabi, Enrico Coiera

2603.11413 2026-03-12
TESTING

Reproducible Synthetic Clinical Letters for Seizure Frequency Information Extraction

Seizure-frequency information is important for epilepsy research and clinical care, but it is usually recorded in variable free-text clinic letters that are hard to annotate and share. We developed...

Yujian Gan, Stephen H. Barlow, Ben Holgate, Joe Davies, James T. Teo, Joel S. Winston, Mark P. Ri...

2603.11407 2026-03-12
TESTING

Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics

Teleoperation of low-cost robotic manipulators remains challenging due to the complexity of mapping human hand articulations to robot joint commands. We present an offline hand-shadowing and retarg...

Hendrik Chiche, Antoine Jamme, Trevor Rigoberto Martinez

2603.11383 2026-03-11
TESTING

Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

Autonomous agents, especially delegated systems with memory, persistent context, and multi-step planning, pose a measurement problem not present in stateless models: an agent that preserves continu...

Christopher Altman

2603.11382 2026-03-11
TESTING

Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification

Mechanical ventilation (MV) is a life-saving intervention for patients with acute respiratory failure (ARF) in the ICU. However, inappropriate ventilator settings could cause ventilator-induced lun...

Hang Yu, Huidong Liu, Qingchen Zhang, William Joy, Kateryna Nikulina, Andreas A. Schuppert, Sina ...

2603.11372 2026-03-11
TESTING

Relaxed Efficient Acquisition of Context and Temporal Features

In many biomedical applications, measurements are not freely available at inference time: each laboratory test, imaging modality, or assessment incurs financial cost, time burden, or patient risk. ...

Yunni Qu, Dzung Dinh, Grant King, Whitney Ringwald, Bing Cai Kok, Kathleen Gates, Aiden Wright, J...

2603.11370 2026-03-11
TESTING

abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance

Antimicrobial resistance (AMR) poses a global health threat, reducing the effectiveness of antibiotics and complicating clinical decision-making. To address this challenge, we introduce abx_amr_sim...

Joyce Lee, Seth Blumberg

2603.11369 2026-03-11
TESTING

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Voice biometric systems can exhibit sex-related performance gaps even when overall verification accuracy is strong. We attribute these gaps to two practical mechanisms: (i) demographic shortcut lea...

Yangyang Qu, Todisco Massimiliano, Galdi Chiara, Evans Nicholas

2603.11360 2026-03-11
TESTING

Teleodynamic Learning a new Paradigm For Interpretable AI

We introduce Teleodynamic Learning, a new paradigm for machine learning in which learning is not the minimization of a fixed objective, but the emergence and stabilization of functional organizatio...

Enrique ter Horst, Juan Diego Zambrano

2603.11355 2026-03-11
TESTING

Human Navigation Behaviour and Brain Dynamics in Real-world Contexts

The study of navigation behaviour and the associated brain dynamics have been a focus increasing research over the last decades. Coinciding with this has been an increased focus on a more ecologica...

Pablo Fernandez Velasco, Antoine Coutrot, Hugo J. Spiers

2603.11347 2026-03-11
TESTING

Hybrid eTFCE-GRF: Exact Cluster-Size Retrieval with Analytical p-Values for Voxel-Based Morphometry

Threshold-free cluster enhancement (TFCE) integrates cluster extent across thresholds to improve voxel-wise neuroimaging inference, but permutation testing makes it prohibitively slow for large dat...

Don Yin, Hao Chen, Takeshi Miki, Boxing Liu, Enyu Yang

2603.11344 2026-03-11
TESTING

FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles

Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles remains poorly explored....

Arun Vignesh Malarkkan, Manan Roy Choudhury, Guangwei Zhang, Vivek Gupta, Qingyun Wang, Yanjie Fu...

2603.11339 2026-03-11
TESTING

RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents

LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single scalar test metric. This creates a structural vulnerability: an agent can increase the reported s...

Yonas Atinafu, Robin Cohen

2603.11337 2026-03-11
TESTING

Proto-NUX: A prototype telescope for ground-based near-ultraviolet observations

The Near-UV-eXplorer (NUX) is a proposed ground-based, wide-field telescope array with a field of view of $\sim$70 square degrees, designed to operate over the 300-350 nm wavelength range and to ac...

Rasjied Sloot, Rudy Wijnands, Steven Bloemen, Rik ter Horst, Hans Ellermeijer, Alexander Hoogerbrug

2603.11336 2026-03-11