Personal Assistant Web

TESTING

Implicit Intelligence -- Evaluating Agents on What Users Don't Say

Real-world requests to AI agents are fundamentally underspecified. Natural human communication relies on shared context and unstated constraints that speakers expect listeners to infer. Current age...

Ved Sirdeshmukh, Marc Wetter

2602.20424 • 2026-02-23

View PDF

TESTING

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

Machine Learning as a Service (MLaaS) has emerged as a widely adopted paradigm for providing access to deep neural network (DNN) models, enabling users to conveniently leverage these models through...

Bolin Shen, Zhan Cheng, Neil Zhenqiang Gong, Fan Yao, Yushun Dong

2602.20419 • 2026-02-23

View PDF

TESTING

CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

Graph neural networks (GNNs) have demonstrated superior performance in various applications, such as recommendation systems and financial risk management. However, deploying large-scale GNN models ...

Bolin Shen, Md Shamim Seraj, Zhan Cheng, Shayok Chakraborty, Yushun Dong

2602.20418 • 2026-02-23

View PDF

TESTING

SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images

The rapid advancement of generative models has made the detection of AI-generated images a critical challenge for both research and society. Recent works have shown that most state-of-the-art fake ...

Aayush Dhakal, Subash Khanal, Srikumar Sastry, Jacob Arndt, Philipe Ambrozio Dias, Dalton Lunga, ...

2602.20412 • 2026-02-23

View PDF

TESTING

Highly Efficient Selection of High-Redshift Emission-Line Galaxies for future DESI-like surveys with Deep Multi-band Imaging

Emission-line galaxies (ELGs) are an important tracer of baryon acoustic oscillations (BAO) and large-scale structure (LSS) at $z > 1$. In this work, we investigate the feasibility of using deep wi...

Yoquelbin Salcedo Hernandez, Jeffrey A. Newman, Brett. H. Andrews, Biprateep Dey, Rongpu. Zhou, N...

2602.20405 • 2026-02-23

View PDF

TESTING

Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation

To steer language models towards truthful outputs on tasks which are beyond human capability, previous work has suggested training models on easy tasks to steer them on harder ones (easy-to-hard ge...

Callum Canavan, Aditya Shrivastava, Allison Qi, Jonathan Michala, Fabien Roger

2602.20400 • 2026-02-23

View PDF

TESTING

Detecting and Mitigating Group Bias in Heterogeneous Treatment Effects

Heterogeneous treatment effects (HTEs) are increasingly estimated using machine learning models that produce highly personalized predictions of treatment effects. In practice, however, predicted tr...

Joel Persson, Jurriën Bakker, Dennis Bohle, Stefan Feuerriegel, Florian von Wangenheim

2602.20383 • 2026-02-23

View PDF

TESTING

Case-Aware LLM-as-a-Judge Evaluation for Enterprise-Scale RAG Systems

Enterprise Retrieval-Augmented Generation (RAG) assistants operate in multi-turn, case-based workflows such as technical support and IT operations, where evaluation must reflect operational constra...

Mukul Chhabra, Luigi Medrano, Arush Verma

2602.20379 • 2026-02-23

View PDF

TESTING

LANTERN: Characterization technology for low threshold cryogenic detectors

The use of low-temperature detectors, such as cryogenic calorimeters, has pioneered the recent advancements in low-energy rare event searches. These detectors provide a low-noise environment essent...

Giorgio Del Castello

2602.20369 • 2026-02-23

View PDF

TESTING

StochasticBarrier.jl: A Toolbox for Stochastic Barrier Function Synthesis

We present StochasticBarrier.jl, an open-source Julia-based toolbox for generating Stochastic Barrier Functions (SBFs) for safety verification of discrete-time stochastic systems with additive Gaus...

Rayan Mazouz, Frederik Baymler Mathiesen, Luca Laurenti, Morteza Lahijanian

2602.20359 • 2026-02-23

View PDF

TESTING

UAMTERS: Uncertainty-Aware Mutation Analysis for DL-enabled Robotic Software

Self-adaptive robots adjust their behaviors in response to unpredictable environmental changes. These robots often incorporate deep learning (DL) components into their software to support functiona...

Chengjie Lu, Jiahui Wu, Shaukat Ali, Malaika Din Hashmi, Sebastian Mathias Thomle Mason, Francois...

2602.20334 • 2026-02-23

View PDF

TESTING

DMCD: Semantic-Statistical Framework for Causal Discovery

We present DMCD (DataMap Causal Discovery), a two-phase causal discovery framework that integrates LLM-based semantic drafting from variable metadata with statistical validation on observational da...

Samarth KaPatel, Sofia Nikiforova, Giacinto Paolo Saggese, Paul Smith

2602.20333 • 2026-02-23

View PDF

TESTING

Learning Physical Principles from Interaction: Self-Evolving Planning via Test-Time Memory

Reliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in ge...

Haoyang Li, Yang You, Hao Su, Leonidas Guibas

2602.20323 • 2026-02-23

View PDF

TESTING

Fast Spectrogram Event Extraction via Offline Self-Supervised Learning: From Fusion Diagnostics to Bioacoustics

Next-generation fusion facilities like ITER face a "data deluge," generating petabytes of multi-diagnostic signals daily that challenge manual analysis. We present a "signals-first" self-supervised...

Nathaniel Chen, Kouroche Bouchiat, Peter Steiner, Andrew Rothstein, David Smith, Max Austin, Mike...

2602.20317 • 2026-02-23

View PDF

TESTING

BASS LVI. Connecting X-ray variability with AGN physical properties and a new path to Cosmological distances

X-ray variability is a well-established characteristic of active galactic nuclei (AGN), known to correlate inversely with both the supermassive black hole mass and luminosity, although the degree o...

Matilde Signorini, Federica Ricci, Alessia Tortosa, Stefano Bianchi, Fabio La Franca, Franz E. Ba...

2602.20315 • 2026-02-23

View PDF

TESTING

Rapid Testing, Duck Lips, and Tilted Cameras: Youth Everyday Algorithm Auditing Practices with Generative AI Filters

Today's youth have extensive experience interacting with artificial intelligence and machine learning applications on popular social media platforms, putting youth in a unique position to examine, ...

Lauren Vogelstein, Vedya Konda, Deborah Fields, Yasmin Kafai, Luis Morales-Navarro, Danaé Metaxa

2602.20314 • 2026-02-23

View PDF

TESTING

On the Pólya Frequency Order of the de Bruijn Newman Kernel. Certified Failure at Order Five and the Toeplitz Threshold Phenomenon

We prove that the classical de Bruijn--Newman kernel $K(u) = Φ(|u|)$, arising in the study of the Riemann zeta function via the de Bruijn--Newman constant, is not a Pólya frequency function of orde...

Wojciech Michałowski

2602.20313 • 2026-02-23

View PDF

TESTING

In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks

Time-series foundation models (TSFMs) have demonstrated strong generalization capabilities across diverse datasets and tasks. However, existing foundation models are typically pre-trained to enhanc...

Shangqing Xu, Harshavardhan Kamarthi, Haoxin Liu, B. Aditya Prakash

2602.20307 • 2026-02-23

View PDF

TESTING

Quantifying the Expectation-Realisation Gap for Agentic AI Systems

Agentic AI systems are deployed with expectations of substantial productivity gains, yet rigorous empirical evidence reveals systematic discrepancies between pre-deployment expectations and post-de...

Sebastian Lobentanzer

2602.20292 • 2026-02-23

View PDF

TESTING

PhantomRun: Auto Repair of Compilation Errors in Embedded Open Source Software

Continuous Integration (CI) pipelines for embedded software sometimes fail during compilation, consuming significant developer time for debugging. We study four major open-source embedded system pr...

Han Fu, Andreas Ermedahl, Sigrid Eldh, Kristian Wiklund, Philipp Haller, Cyrille Artho

2602.20284 • 2026-02-23

View PDF

Papers