Papers
Research papers from arXiv and related sources
Partial Causal Structure Learning for Valid Selective Conformal Inference under Interventions
Selective conformal prediction can yield substantially tighter uncertainty sets when we can identify calibration examples that are exchangeable with the test example. In interventional settings, su...
Amir Asiaee, Kavey Aryan, James P. Long
Tool Verification for Test-Time Reinforcement Learning
Test-time reinforcement learning (TTRL) has emerged as a promising paradigm for self-evolving large reasoning models (LRMs), enabling online adaptation on unlabeled test inputs via self-induced rew...
Ruotong Liao, Nikolai Röhrich, Xiaohan Wang, Yuhui Zhang, Yasaman Samadzadeh, Volker Tresp, Seren...
Frontier Models Can Take Actions at Low Probabilities
Pre-deployment evaluations inspect only a limited sample of model actions. A malicious model seeking to evade oversight could exploit this by randomizing when to "defect": misbehaving so rarely tha...
Alex Serrano, Wen Xing, David Lindner, Erik Jenner
Comparative Analysis of Spatiotemporal Volatility Models: An Empirical Study on Financial Network Series
Various spatiotemporal and network GARCH models have recently been proposed to capture volatility interactions, such as the transmission of market risk across financial networks. These approaches r...
Ariane N. Meli Chrisko, Jessie Li, Philipp Otto, Wolfgang Schmid
From Leaderboard to Deployment: Code Quality Challenges in AV Perception Repositories
Autonomous vehicle (AV) perception models are typically evaluated solely on benchmark performance metrics, with limited attention to code quality, production readiness and long-term maintainability...
Mateus Karvat, Bram Adams, Sidney Givigi
Personal Health Data Integration and Intelligence through Semantic Web and Blockchain Technologies
Data integration among various stakeholders in the healthcare space remains a challenge, despite the impressive advances in Health AI in the past decade. There is a lot of ``messy'' non-standard bu...
Oshani Seneviratne, Manan Shukla, Jianjing Lin
Improving the Estimation of Ship Length via ISAR
A method for estimating the aspect angle of ships at sea from an ISAR is developed. The ISAR AutoTrack (IAT) algorithm uses the information from the adaptive motion compensation velocity to improve...
John R. Bennett
Reservoir Subspace Injection for Online ICA under Top-n Whitening
Reservoir expansion can improve online independent component analysis (ICA) under nonlinear mixing, yet top-$n$ whitening may discard injected features. We formalize this bottleneck as \emph{reserv...
Wenjun Xiao, Yuda Bi, Vince D Calhoun
Interpreting map-based $E$/$B$ spectral properties of CMB foregrounds
Map-space $E$/$B$ decompositions of linear polarization are attractive for foreground and CMB analyses because they isolate the $B$-family patterns that contaminate primordial tensor searches from ...
Gilles Weymann-Despres, Léo Vacher, Michael E. Jones, Angela C. Taylor, Carlo Baccigalupi, A. J. ...
Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale
The rapid proliferation of Claude agent skills has raised the central question of how to effectively leverage, manage, and scale the agent skill ecosystem. In this paper, we propose AgentSkillOS, t...
Hao Li, Chunjiang Mu, Jianhao Chen, Siyue Ren, Zhiyao Cui, Yiqun Zhang, Lei Bai, Shuyue Hu
Kinetic energy fluctuations and specific heat in generalized ensembles
We derive an exact generalization of the well-known Lebowitz--Percus--Verlet (LPV) formula that relates the kinetic energy fluctuations of an isolated system to its specific heat. Our general formu...
Sergio Davis, Catalina Ruíz, Claudia Loyola, Carlos Femenías, Joaquín Peralta
Investigating the short-term effects of particulate matter (PM) chemical components on mortality and the potential modifying effect of extreme temperature: A time-series analysis in London
Particulate matter (PM) is linked to adverse health outcomes, yet the roles of specific PM components and their modification by extreme temperature remain unclear. We examined short-term associatio...
Xiaolu Zhang, Anna Font, Anja Tremper, Max Priestman, Shawn Y. Lee, David C. Green, Dimitris Evan...
Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control
Controlling the false discovery rate (FDR) in variable selection becomes challenging when predictors are correlated, as existing methods often exclude all members of correlated groups and consequen...
Sarah Organ, Toby Kenney, Hong Gu
How Small Can 6G Reason? Scaling Tiny Language Models for AI-Native Networks
Emerging 6G visions, reflected in ongoing standardization efforts within 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, increasingly characterize networks as AI-native systems in which high-level...
Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah
LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards
Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs) by optimizing them against factual outcomes. However, thi...
Guanzheng Chen, Michael Qizhe Shieh, Lidong Bing
Generative AI in Software Testing: Current Trends and Future Directions
This paper investigates current software testing systems and explores how artificial intelligence, specifically Generative AI, can be integrated to enhance these systems. It begins by examining dif...
Tanish Singla, Qusay H. Mahmoud
NextAds: Towards Next-generation Personalized Video Advertising
With the rapid growth of online video consumption, video advertising has become increasingly dominant in the digital advertising landscape. Yet diverse users and viewing contexts makes one-size-fit...
Yiyan Xu, Ruoxuan Xia, Wuqiang Zheng, Fengbin Zhu, Wenjie Wang, Fuli Feng
Sensitivity to sub-GeV dark matter in forthcoming spallation-source neutrino experiments
Sub-GeV thermal dark matter weakly interacting with the Standard Model through vector-portal mediators provides a well-motivated and predictive framework that remains challenging to probe with conv...
D. Aristizabal Sierra, V. De Romeri, D. K. Papoulias, G. Sanchez Garcia
LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations
Large language models (LLMs) are increasingly proposed as agents in strategic decision environments, yet their behavior in structured geopolitical simulations remains under-researched. We evaluate ...
Veronika Solopova, Viktoria Skorik, Maksym Tereshchenko, Alina Haidun, Ostap Vykhopen
Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning
We introduce Pencil Puzzle Bench, a framework for evaluating large language model reasoning through pencil puzzles, a family of constraint-satisfaction problems closely related to NP-complete probl...
Justin Waugh