Papers
Research papers from arXiv and related sources
Implicit Intelligence -- Evaluating Agents on What Users Don't Say
Real-world requests to AI agents are fundamentally underspecified. Natural human communication relies on shared context and unstated constraints that speakers expect listeners to infer. Current age...
Ved Sirdeshmukh, Marc Wetter
CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks
Machine Learning as a Service (MLaaS) has emerged as a widely adopted paradigm for providing access to deep neural network (DNN) models, enabling users to conveniently leverage these models through...
Bolin Shen, Zhan Cheng, Neil Zhenqiang Gong, Fan Yao, Yushun Dong
CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense
Graph neural networks (GNNs) have demonstrated superior performance in various applications, such as recommendation systems and financial risk management. However, deploying large-scale GNN models ...
Bolin Shen, Md Shamim Seraj, Zhan Cheng, Shayok Chakraborty, Yushun Dong
SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images
The rapid advancement of generative models has made the detection of AI-generated images a critical challenge for both research and society. Recent works have shown that most state-of-the-art fake ...
Aayush Dhakal, Subash Khanal, Srikumar Sastry, Jacob Arndt, Philipe Ambrozio Dias, Dalton Lunga, ...
Highly Efficient Selection of High-Redshift Emission-Line Galaxies for future DESI-like surveys with Deep Multi-band Imaging
Emission-line galaxies (ELGs) are an important tracer of baryon acoustic oscillations (BAO) and large-scale structure (LSS) at $z > 1$. In this work, we investigate the feasibility of using deep wi...
Yoquelbin Salcedo Hernandez, Jeffrey A. Newman, Brett. H. Andrews, Biprateep Dey, Rongpu. Zhou, N...
Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation
To steer language models towards truthful outputs on tasks which are beyond human capability, previous work has suggested training models on easy tasks to steer them on harder ones (easy-to-hard ge...
Callum Canavan, Aditya Shrivastava, Allison Qi, Jonathan Michala, Fabien Roger
Detecting and Mitigating Group Bias in Heterogeneous Treatment Effects
Heterogeneous treatment effects (HTEs) are increasingly estimated using machine learning models that produce highly personalized predictions of treatment effects. In practice, however, predicted tr...
Joel Persson, Jurriën Bakker, Dennis Bohle, Stefan Feuerriegel, Florian von Wangenheim
Case-Aware LLM-as-a-Judge Evaluation for Enterprise-Scale RAG Systems
Enterprise Retrieval-Augmented Generation (RAG) assistants operate in multi-turn, case-based workflows such as technical support and IT operations, where evaluation must reflect operational constra...
Mukul Chhabra, Luigi Medrano, Arush Verma
LANTERN: Characterization technology for low threshold cryogenic detectors
The use of low-temperature detectors, such as cryogenic calorimeters, has pioneered the recent advancements in low-energy rare event searches. These detectors provide a low-noise environment essent...
Giorgio Del Castello
StochasticBarrier.jl: A Toolbox for Stochastic Barrier Function Synthesis
We present StochasticBarrier.jl, an open-source Julia-based toolbox for generating Stochastic Barrier Functions (SBFs) for safety verification of discrete-time stochastic systems with additive Gaus...
Rayan Mazouz, Frederik Baymler Mathiesen, Luca Laurenti, Morteza Lahijanian
UAMTERS: Uncertainty-Aware Mutation Analysis for DL-enabled Robotic Software
Self-adaptive robots adjust their behaviors in response to unpredictable environmental changes. These robots often incorporate deep learning (DL) components into their software to support functiona...
Chengjie Lu, Jiahui Wu, Shaukat Ali, Malaika Din Hashmi, Sebastian Mathias Thomle Mason, Francois...
DMCD: Semantic-Statistical Framework for Causal Discovery
We present DMCD (DataMap Causal Discovery), a two-phase causal discovery framework that integrates LLM-based semantic drafting from variable metadata with statistical validation on observational da...
Samarth KaPatel, Sofia Nikiforova, Giacinto Paolo Saggese, Paul Smith
Learning Physical Principles from Interaction: Self-Evolving Planning via Test-Time Memory
Reliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in ge...
Haoyang Li, Yang You, Hao Su, Leonidas Guibas
Fast Spectrogram Event Extraction via Offline Self-Supervised Learning: From Fusion Diagnostics to Bioacoustics
Next-generation fusion facilities like ITER face a "data deluge," generating petabytes of multi-diagnostic signals daily that challenge manual analysis. We present a "signals-first" self-supervised...
Nathaniel Chen, Kouroche Bouchiat, Peter Steiner, Andrew Rothstein, David Smith, Max Austin, Mike...
BASS LVI. Connecting X-ray variability with AGN physical properties and a new path to Cosmological distances
X-ray variability is a well-established characteristic of active galactic nuclei (AGN), known to correlate inversely with both the supermassive black hole mass and luminosity, although the degree o...
Matilde Signorini, Federica Ricci, Alessia Tortosa, Stefano Bianchi, Fabio La Franca, Franz E. Ba...
Rapid Testing, Duck Lips, and Tilted Cameras: Youth Everyday Algorithm Auditing Practices with Generative AI Filters
Today's youth have extensive experience interacting with artificial intelligence and machine learning applications on popular social media platforms, putting youth in a unique position to examine, ...
Lauren Vogelstein, Vedya Konda, Deborah Fields, Yasmin Kafai, Luis Morales-Navarro, Danaé Metaxa
On the Pólya Frequency Order of the de Bruijn Newman Kernel. Certified Failure at Order Five and the Toeplitz Threshold Phenomenon
We prove that the classical de Bruijn--Newman kernel $K(u) = Φ(|u|)$, arising in the study of the Riemann zeta function via the de Bruijn--Newman constant, is not a Pólya frequency function of orde...
Wojciech Michałowski
In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks
Time-series foundation models (TSFMs) have demonstrated strong generalization capabilities across diverse datasets and tasks. However, existing foundation models are typically pre-trained to enhanc...
Shangqing Xu, Harshavardhan Kamarthi, Haoxin Liu, B. Aditya Prakash
Quantifying the Expectation-Realisation Gap for Agentic AI Systems
Agentic AI systems are deployed with expectations of substantial productivity gains, yet rigorous empirical evidence reveals systematic discrepancies between pre-deployment expectations and post-de...
Sebastian Lobentanzer
PhantomRun: Auto Repair of Compilation Errors in Embedded Open Source Software
Continuous Integration (CI) pipelines for embedded software sometimes fail during compilation, consuming significant developer time for debugging. We study four major open-source embedded system pr...
Han Fu, Andreas Ermedahl, Sigrid Eldh, Kristian Wiklund, Philipp Haller, Cyrille Artho