Papers
Research papers from arXiv and related sources
Stochastic Optimization and Coupling
We study optimization problems in which a linear functional is maximized over probability measures that are dominated by a given measure according to an integral stochastic order in an arbitrary di...
Frank Yang, Kai Hao Yang
Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution
We present Verified Multi-Agent Orchestration (VMAO), a framework that coordinates specialized LLM-based agents through a verification-driven iterative loop. Given a complex query, our system decom...
Xing Zhang, Yanwei Cui, Guanghui Wang, Qucy Wei Qiu, Ziyuan Li, Fangwei Han, Yajing Huang, Hengzh...
NCCLbpf: Verified, Composable Policy Execution for GPU Collective Communication
NCCL is the de facto standard for collective GPU communication in large-scale distributed training, relying heavily on plugins to customize runtime behavior. However, these plugins execute as unver...
Yusheng Zheng
ZTab: Domain-based Zero-shot Annotation for Table Columns
This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user...
Ehsan Hoseinzade, Ke Wang
Grounding Robot Generalization in Training Data via Retrieval-Augmented VLMs
Recent work on robot manipulation has advanced policy generalization to novel scenarios. However, it is often difficult to characterize how different evaluation settings actually represent generali...
Jensen Gao, Dorsa Sadigh, Sandy Huang, Dhruv Shah
Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations
End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely un...
Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska
Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI
Ramaswamy et al. reported in \textit{Nature Medicine} that ChatGPT Health under-triages 51.6\% of emergencies, concluding that consumer-facing AI triage poses safety risks. However, their evaluatio...
David Fraile Navarro, Farah Magrabi, Enrico Coiera
Reproducible Synthetic Clinical Letters for Seizure Frequency Information Extraction
Seizure-frequency information is important for epilepsy research and clinical care, but it is usually recorded in variable free-text clinic letters that are hard to annotate and share. We developed...
Yujian Gan, Stephen H. Barlow, Ben Holgate, Joe Davies, James T. Teo, Joel S. Winston, Mark P. Ri...
Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics
Teleoperation of low-cost robotic manipulators remains challenging due to the complexity of mapping human hand articulations to robot joint commands. We present an offline hand-shadowing and retarg...
Hendrik Chiche, Antoine Jamme, Trevor Rigoberto Martinez
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol
Autonomous agents, especially delegated systems with memory, persistent context, and multi-step planning, pose a measurement problem not present in stateless models: an agent that preserves continu...
Christopher Altman
Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification
Mechanical ventilation (MV) is a life-saving intervention for patients with acute respiratory failure (ARF) in the ICU. However, inappropriate ventilator settings could cause ventilator-induced lun...
Hang Yu, Huidong Liu, Qingchen Zhang, William Joy, Kateryna Nikulina, Andreas A. Schuppert, Sina ...
Relaxed Efficient Acquisition of Context and Temporal Features
In many biomedical applications, measurements are not freely available at inference time: each laboratory test, imaging modality, or assessment incurs financial cost, time burden, or patient risk. ...
Yunni Qu, Dzung Dinh, Grant King, Whitney Ringwald, Bing Cai Kok, Kathleen Gates, Aiden Wright, J...
abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance
Antimicrobial resistance (AMR) poses a global health threat, reducing the effectiveness of antibiotics and complicating clinical decision-making. To address this challenge, we introduce abx_amr_sim...
Joyce Lee, Seth Blumberg
Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics
Voice biometric systems can exhibit sex-related performance gaps even when overall verification accuracy is strong. We attribute these gaps to two practical mechanisms: (i) demographic shortcut lea...
Yangyang Qu, Todisco Massimiliano, Galdi Chiara, Evans Nicholas
Teleodynamic Learning a new Paradigm For Interpretable AI
We introduce Teleodynamic Learning, a new paradigm for machine learning in which learning is not the minimization of a fixed objective, but the emergence and stabilization of functional organizatio...
Enrique ter Horst, Juan Diego Zambrano
Human Navigation Behaviour and Brain Dynamics in Real-world Contexts
The study of navigation behaviour and the associated brain dynamics have been a focus increasing research over the last decades. Coinciding with this has been an increased focus on a more ecologica...
Pablo Fernandez Velasco, Antoine Coutrot, Hugo J. Spiers
Hybrid eTFCE-GRF: Exact Cluster-Size Retrieval with Analytical p-Values for Voxel-Based Morphometry
Threshold-free cluster enhancement (TFCE) integrates cluster extent across thresholds to improve voxel-wise neuroimaging inference, but permutation testing makes it prohibitively slow for large dat...
Don Yin, Hao Chen, Takeshi Miki, Boxing Liu, Enyu Yang
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles
Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles remains poorly explored....
Arun Vignesh Malarkkan, Manan Roy Choudhury, Guangwei Zhang, Vivek Gupta, Qingyun Wang, Yanjie Fu...
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single scalar test metric. This creates a structural vulnerability: an agent can increase the reported s...
Yonas Atinafu, Robin Cohen
Proto-NUX: A prototype telescope for ground-based near-ultraviolet observations
The Near-UV-eXplorer (NUX) is a proposed ground-based, wide-field telescope array with a field of view of $\sim$70 square degrees, designed to operate over the 300-350 nm wavelength range and to ac...
Rasjied Sloot, Rudy Wijnands, Steven Bloemen, Rik ter Horst, Hans Ellermeijer, Alexander Hoogerbrug