Personal Assistant Web

TESTING

A statistical perspective on transformers for small longitudinal cohort data

Modeling of longitudinal cohort data typically involves complex temporal dependencies between multiple variables. There, the transformer architecture, which has been highly successful in language a...

Kiana Farhadyar, Maren Hackenberg, Kira Ahrens, Charlotte Schenk, Bianca Kollmann, Oliver Tüscher...

2602.16914 • 2026-02-18

View PDF

TESTING

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-...

Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, Ting Wang

2602.16901 • 2026-02-18

View PDF

TESTING

Domain Decomposition for Mean Curvature Flow of Surface Polygonal Meshes

We examine the use of domain decomposition for potentially more efficient mean curvature flow of surface meshes, whose faces are arbitrary simple polygons. We first test traditional domain decompos...

Lenka Ptackova, Michal Outrata

2602.16874 • 2026-02-18

View PDF

TESTING

Testing the cosmic distance-duality relation with localized fast radio bursts: a cosmological model-independent study

We test the Etherington cosmic distance-duality relation (CDDR), by comparing Type Ia supernova (SNIa) luminosity-distance information from the Pantheon+ compilation with an angular-diameter-distan...

Jéferson A. S. Fortunato, Surajit Kalita, Amanda Weltman

2602.16869 • 2026-02-18

View PDF

TESTING

On the Tightness of the Second-Order Cone Relaxation of the Optimal Power Flow with Angles Recovery in Meshed Networks

This letter investigates properties of the second-order cone relaxation of the optimal power flow (OPF) problem, with emphasis on relaxation tightness, nodal voltage angles recovery, and alternatin...

Ginevra Larroux, Matthieu Jacobs, Mario Paolone

2602.16866 • 2026-02-18

View PDF

TESTING

SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation

The ability to manipulate tools significantly expands the set of tasks a robot can perform. Yet, tool manipulation represents a challenging class of dexterity, requiring grasping thin objects, in-h...

Kushal Kedia, Tyler Ga Wei Lum, Jeannette Bohg, C. Karen Liu

2602.16863 • 2026-02-18

View PDF

TESTING

Asteroidal activity amongst meteor datasets: Confirmed new "rock-comet" stream and search for a tidal disruption signature

Asteroid activity (e.g., thermo-mechanical breakdown, impacts, rotational shedding, tidal disruption, etc.) can inject meteoroids into near-Earth space and leave detectable signatures in orbit cata...

Patrick M. Shober

2602.16845 • 2026-02-18

View PDF

TESTING

Overseeing Agents Without Constant Oversight: Challenges and Opportunities

To enable human oversight, agentic AI systems often provide a trace of reasoning and action steps. Designing traces to have an informative, but not overwhelming, level of detail remains a critical ...

Madeleine Grunde-McLaughlin, Hussein Mozannar, Maya Murad, Jingya Chen, Saleema Amershi, Adam Fou...

2602.16844 • 2026-02-18

View PDF

TESTING

New Physics and Symmetry Tests with Polarized Photon Fusion and Dipole Moments

We discuss new-physics searches and symmetry tests with dipole moments, emphasizing the role of polarization observables. As a primary benchmark, we consider polarized photon fusion in the $e^+ e^-...

Fang Xu

2602.16834 • 2026-02-18

View PDF

TESTING

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustnes...

Priyaranjan Pattnayak, Sanchari Chowdhuri

2602.16832 • 2026-02-18

View PDF

TESTING

Learning under noisy supervision is governed by a feedback-truth gap

When feedback is absorbed faster than task structure can be evaluated, the learner will favor feedback over truth. A two-timescale model shows this feedback-truth gap is inevitable whenever the two...

Elan Schonfeld, Elias Wisnia

2602.16829 • 2026-02-18

View PDF

TESTING

Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable Guarantees

*Automated circuit discovery* is a central tool in mechanistic interpretability for identifying the internal components of neural networks responsible for specific behaviors. While prior methods ha...

Itamar Hadad, Guy Katz, Shahaf Bassan

2602.16823 • 2026-02-18

View PDF

TESTING

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

When assessing the quality of coding agents, predominant benchmarks focus on solving single issues on GitHub, such as SWE-Bench. In contrast, in real use, these agents solve more various and comple...

Yiqing Xie, Emmy Liu, Gaokai Zhang, Nachiket Kotalwar, Shubham Gandhi, Sathwik Acharya, Xingyao W...

2602.16819 • 2026-02-18

View PDF

Papers