Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

Partial Causal Structure Learning for Valid Selective Conformal Inference under Interventions

Selective conformal prediction can yield substantially tighter uncertainty sets when we can identify calibration examples that are exchangeable with the test example. In interventional settings, su...

Amir Asiaee, Kavey Aryan, James P. Long

2603.02204 2026-03-02
TESTING

Tool Verification for Test-Time Reinforcement Learning

Test-time reinforcement learning (TTRL) has emerged as a promising paradigm for self-evolving large reasoning models (LRMs), enabling online adaptation on unlabeled test inputs via self-induced rew...

Ruotong Liao, Nikolai Röhrich, Xiaohan Wang, Yuhui Zhang, Yasaman Samadzadeh, Volker Tresp, Seren...

2603.02203 2026-03-02
AI LLM

Frontier Models Can Take Actions at Low Probabilities

Pre-deployment evaluations inspect only a limited sample of model actions. A malicious model seeking to evade oversight could exploit this by randomizing when to "defect": misbehaving so rarely tha...

Alex Serrano, Wen Xing, David Lindner, Erik Jenner

2603.02202 2026-03-02
TESTING

Comparative Analysis of Spatiotemporal Volatility Models: An Empirical Study on Financial Network Series

Various spatiotemporal and network GARCH models have recently been proposed to capture volatility interactions, such as the transmission of market risk across financial networks. These approaches r...

Ariane N. Meli Chrisko, Jessie Li, Philipp Otto, Wolfgang Schmid

2603.02195 2026-03-02
AI LLM

From Leaderboard to Deployment: Code Quality Challenges in AV Perception Repositories

Autonomous vehicle (AV) perception models are typically evaluated solely on benchmark performance metrics, with limited attention to code quality, production readiness and long-term maintainability...

Mateus Karvat, Bram Adams, Sidney Givigi

2603.02194 2026-03-02
AI LLM

Personal Health Data Integration and Intelligence through Semantic Web and Blockchain Technologies

Data integration among various stakeholders in the healthcare space remains a challenge, despite the impressive advances in Health AI in the past decade. There is a lot of ``messy'' non-standard bu...

Oshani Seneviratne, Manan Shukla, Jianjing Lin

2603.02192 2026-03-02
TESTING

Improving the Estimation of Ship Length via ISAR

A method for estimating the aspect angle of ships at sea from an ISAR is developed. The ISAR AutoTrack (IAT) algorithm uses the information from the adaptive motion compensation velocity to improve...

John R. Bennett

2603.02183 2026-03-02
TESTING

Reservoir Subspace Injection for Online ICA under Top-n Whitening

Reservoir expansion can improve online independent component analysis (ICA) under nonlinear mixing, yet top-$n$ whitening may discard injected features. We formalize this bottleneck as \emph{reserv...

Wenjun Xiao, Yuda Bi, Vince D Calhoun

2603.02178 2026-03-02
TESTING

Interpreting map-based $E$/$B$ spectral properties of CMB foregrounds

Map-space $E$/$B$ decompositions of linear polarization are attractive for foreground and CMB analyses because they isolate the $B$-family patterns that contaminate primordial tensor searches from ...

Gilles Weymann-Despres, Léo Vacher, Michael E. Jones, Angela C. Taylor, Carlo Baccigalupi, A. J. ...

2603.02177 2026-03-02
AI LLM

Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale

The rapid proliferation of Claude agent skills has raised the central question of how to effectively leverage, manage, and scale the agent skill ecosystem. In this paper, we propose AgentSkillOS, t...

Hao Li, Chunjiang Mu, Jianhao Chen, Siyue Ren, Zhiyao Cui, Yiqun Zhang, Lei Bai, Shuyue Hu

2603.02176 2026-03-02
TESTING

Kinetic energy fluctuations and specific heat in generalized ensembles

We derive an exact generalization of the well-known Lebowitz--Percus--Verlet (LPV) formula that relates the kinetic energy fluctuations of an isolated system to its specific heat. Our general formu...

Sergio Davis, Catalina Ruíz, Claudia Loyola, Carlos Femenías, Joaquín Peralta

2603.02168 2026-03-02
TESTING

Investigating the short-term effects of particulate matter (PM) chemical components on mortality and the potential modifying effect of extreme temperature: A time-series analysis in London

Particulate matter (PM) is linked to adverse health outcomes, yet the roles of specific PM components and their modification by extreme temperature remain unclear. We examined short-term associatio...

Xiaolu Zhang, Anna Font, Anja Tremper, Max Priestman, Shawn Y. Lee, David C. Green, Dimitris Evan...

2603.02165 2026-03-02
TESTING

Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control

Controlling the false discovery rate (FDR) in variable selection becomes challenging when predictors are correlated, as existing methods often exclude all members of correlated groups and consequen...

Sarah Organ, Toby Kenney, Hong Gu

2603.02160 2026-03-02
AI LLM

How Small Can 6G Reason? Scaling Tiny Language Models for AI-Native Networks

Emerging 6G visions, reflected in ongoing standardization efforts within 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, increasingly characterize networks as AI-native systems in which high-level...

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

2603.02156 2026-03-02
AI LLM

LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs) by optimizing them against factual outcomes. However, thi...

Guanzheng Chen, Michael Qizhe Shieh, Lidong Bing

2603.02146 2026-03-02
AI LLM

Generative AI in Software Testing: Current Trends and Future Directions

This paper investigates current software testing systems and explores how artificial intelligence, specifically Generative AI, can be integrated to enhance these systems. It begins by examining dif...

Tanish Singla, Qusay H. Mahmoud

2603.02141 2026-03-02
AI LLM

NextAds: Towards Next-generation Personalized Video Advertising

With the rapid growth of online video consumption, video advertising has become increasingly dominant in the digital advertising landscape. Yet diverse users and viewing contexts makes one-size-fit...

Yiyan Xu, Ruoxuan Xia, Wuqiang Zheng, Fengbin Zhu, Wenjie Wang, Fuli Feng

2603.02137 2026-03-02
TESTING

Sensitivity to sub-GeV dark matter in forthcoming spallation-source neutrino experiments

Sub-GeV thermal dark matter weakly interacting with the Standard Model through vector-portal mediators provides a well-motivated and predictive framework that remains challenging to probe with conv...

D. Aristizabal Sierra, V. De Romeri, D. K. Papoulias, G. Sanchez Garcia

2603.02132 2026-03-02
AI LLM

LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations

Large language models (LLMs) are increasingly proposed as agents in strategic decision environments, yet their behavior in structured geopolitical simulations remains under-researched. We evaluate ...

Veronika Solopova, Viktoria Skorik, Maksym Tereshchenko, Alina Haidun, Ostap Vykhopen

2603.02128 2026-03-02
TESTING

Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning

We introduce Pencil Puzzle Bench, a framework for evaluating large language model reasoning through pencil puzzles, a family of constraint-satisfaction problems closely related to NP-complete probl...

Justin Waugh

2603.02119 2026-03-02