Personal Assistant Web

AI LLM

Patch-Based Spatial Authorship Attribution in Human-Robot Collaborative Paintings

As agentic AI becomes increasingly involved in creative production, documenting authorship has become critical for artists, collectors, and legal contexts. We present a patch-based framework for sp...

Eric Chen, Patricia Alves-Oliveira

2602.17030 • 2026-02-19

View PDF

TESTING

ReIn: Conversational Error Recovery with Reasoning Inception

Conversational agents powered by large language models (LLMs) with tool integration achieve strong performance on fixed task-oriented dialogue datasets but remain vulnerable to unanticipated, user-...

Takyoung Kim, Jinseok Nam, Chandrayee Basu, Xing Fan, Chengyuan Ma, Heng Ji, Gokhan Tur, Dilek Ha...

2602.17022 • 2026-02-19

View PDF

TESTING

M2F: Automated Formalization of Mathematical Literature at Scale

Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, ...

Zichen Wang, Wanli Ma, Zhenyu Ming, Gong Zhang, Kun Yuan, Zaiwen Wen

2602.17016 • 2026-02-19

View PDF

TESTING

CAFE: Channel-Autoregressive Factorized Encoding for Robust Biosignal Spatial Super-Resolution

High-density biosignal recordings are critical for neural decoding and clinical monitoring, yet real-world deployments often rely on low-density (LD) montages due to hardware and operational constr...

Hongjun Liu, Leyu Zhou, Zijianghao Yang, Rujun Han, Shitong Duan, Kuanjian Tang, Chao Yao

2602.17011 • 2026-02-19

View PDF

TESTING

Central limit theorem for linear eigenvalue statistics of random geometric graphs

Random spatial networks-that is, graphs whose connectivity is governed by geometric proximity-have emerged as fundamental models for systems constrained by an underlying spatial structure. A protot...

Christian Hirsch, Kyeongsik Nam, Moritz Otto

2602.17006 • 2026-02-19

View PDF

TESTING

Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding

Multi-path speculative decoding accelerates lossless sampling from a target model by using a cheaper draft model to generate a draft tree of tokens, and then applies a verification algorithm that a...

Rahul Thomas, Teo Kitanovski, Micah Goldblum, Arka Pal

2602.16994 • 2026-02-19

View PDF

TESTING

Radiological mapping and uncertainty quantification by a fast Microcanonical Langevin Monte Carlo sampler

Radiological mapping plays a critical role in nuclear emergency response and environmental management activities. A radiation image, representing the spatial and intensity distribution of the radio...

Lei Pan, Jaewon Lee, Brian J. Quiter, Jakob Robnik, Uroš Seljak, Jayson R. Vavrek

2602.16991 • 2026-02-19

View PDF

TESTING

Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning

Black-box safety evaluation of AI systems assumes model behavior on test distributions reliably predicts deployment performance. We formalize and challenge this assumption through latent context-co...

Vishal Srivastava

2602.16984 • 2026-02-19

View PDF

TESTING

Mason: Type- and Name-Guided Program Synthesis

Object-oriented programs tend to be written using many common coding idioms, such as those captured by design patterns. While design patterns are useful, implementing them is often tedious and repe...

Jasper Geer, Fox Huston, Jeffrey S. Foster

2602.16981 • 2026-02-19

View PDF

TESTING

Lies, Labels, and Mechanisms

We test whether lying aversion can steer equilibrium selection in mechanism design. In a principal-worker environment, the direct mechanism admits two dominant-strategy equilibria: the designer's t...

Alex L. Brown, Ethan Park, Rodrigo A. Velez

2602.16973 • 2026-02-19

View PDF

TESTING

DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in image and video generation, but their success comes at the cost of heavy computation. This inefficiency is largely due to...

Dahye Kim, Deepti Ghadiyaram, Raghudeep Gadde

2602.16968 • 2026-02-19

View PDF

TESTING

Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling

The goal of $L$-step speculative decoding is to accelerate autoregressive decoding of a target model by using a cheaper draft model to generate a candidate path of $L$ tokens. Based on a verificati...

Rahul Thomas, Arka Pal

2602.16961 • 2026-02-18

View PDF

TESTING

Neural Proposals, Symbolic Guarantees: Neuro-Symbolic Graph Generation with Hard Constraints

We challenge black-box purely deep neural approaches for molecules and graph generation, which are limited in controllability and lack formal guarantees. We introduce Neuro-Symbolic Graph Generativ...

Chuqin Geng, Li Zhang, Mark Zhang, Haolin Ye, Ziyu Zhao, Xujie Si

2602.16954 • 2026-02-18

View PDF

TESTING

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical...

Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao

2602.16953 • 2026-02-18

View PDF

TESTING

Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming

This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation,...

Philip Sosnin, Jodie Knapp, Fraser Kennedy, Josh Collyer, Calvin Tsay

2602.16944 • 2026-02-18

View PDF

TESTING

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text outputs alone do not carry. Safety evaluation...

Arnold Cartagena, Ariane Teixeira

2602.16943 • 2026-02-18

View PDF

TESTING

Nudging Attention to Workplace Meeting Goals: A Large-Scale, Preregistered Field Experiment

Ineffective meetings are pervasive. Thinking ahead explicitly about meeting goals may improve effectiveness, but current collaboration platforms lack integrated support. We tested a lightweight goa...

Lev Tankelevitch, Ava Elizabeth Scott, Nagaravind Challakere, Payod Panda, Sean Rintel

2602.16939 • 2026-02-18

View PDF

TESTING

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

The promise of LLM-based user simulators to improve conversational AI is hindered by a critical "realism gap," leading to systems that are optimized for simulated interactions, but may fail to perf...

Ofer Meshi, Krisztian Balog, Sally Goldman, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Gl...

2602.16938 • 2026-02-18

View PDF

TESTING

Free Quantum Computing

Quantum computing improves substantially on known classical algorithms for various important problems, but the nature of the relationship between quantum and classical computing is not yet fully un...

Jacques Carette, Chris Heunen, Robin Kaarsgaard, Neil J. Ross, Amr Sabry

2602.16927 • 2026-02-18

View PDF

TESTING

Stellar Paternity Tests: Matching High-Latitude B Stars to the Open Clusters of their Birth

OB stars generally form in open clusters within the Milky Way's thin disk, so when they are found at high Galactic latitudes, it is thought that they were ejected from their birth clusters during t...

Brandon Schweers, M. Virginia McSwain

2602.16925 • 2026-02-18

View PDF

Papers