Papers
Research papers from arXiv and related sources
A statistical perspective on transformers for small longitudinal cohort data
Modeling of longitudinal cohort data typically involves complex temporal dependencies between multiple variables. There, the transformer architecture, which has been highly successful in language a...
Kiana Farhadyar, Maren Hackenberg, Kira Ahrens, Charlotte Schenk, Bianca Kollmann, Oliver Tüscher...
AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks
LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-...
Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, Ting Wang
Domain Decomposition for Mean Curvature Flow of Surface Polygonal Meshes
We examine the use of domain decomposition for potentially more efficient mean curvature flow of surface meshes, whose faces are arbitrary simple polygons. We first test traditional domain decompos...
Lenka Ptackova, Michal Outrata
Testing the cosmic distance-duality relation with localized fast radio bursts: a cosmological model-independent study
We test the Etherington cosmic distance-duality relation (CDDR), by comparing Type Ia supernova (SNIa) luminosity-distance information from the Pantheon+ compilation with an angular-diameter-distan...
Jéferson A. S. Fortunato, Surajit Kalita, Amanda Weltman
On the Tightness of the Second-Order Cone Relaxation of the Optimal Power Flow with Angles Recovery in Meshed Networks
This letter investigates properties of the second-order cone relaxation of the optimal power flow (OPF) problem, with emphasis on relaxation tightness, nodal voltage angles recovery, and alternatin...
Ginevra Larroux, Matthieu Jacobs, Mario Paolone
SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation
The ability to manipulate tools significantly expands the set of tasks a robot can perform. Yet, tool manipulation represents a challenging class of dexterity, requiring grasping thin objects, in-h...
Kushal Kedia, Tyler Ga Wei Lum, Jeannette Bohg, C. Karen Liu
Asteroidal activity amongst meteor datasets: Confirmed new "rock-comet" stream and search for a tidal disruption signature
Asteroid activity (e.g., thermo-mechanical breakdown, impacts, rotational shedding, tidal disruption, etc.) can inject meteoroids into near-Earth space and leave detectable signatures in orbit cata...
Patrick M. Shober
Overseeing Agents Without Constant Oversight: Challenges and Opportunities
To enable human oversight, agentic AI systems often provide a trace of reasoning and action steps. Designing traces to have an informative, but not overwhelming, level of detail remains a critical ...
Madeleine Grunde-McLaughlin, Hussein Mozannar, Maya Murad, Jingya Chen, Saleema Amershi, Adam Fou...
New Physics and Symmetry Tests with Polarized Photon Fusion and Dipole Moments
We discuss new-physics searches and symmetry tests with dipole moments, emphasizing the role of polarization observables. As a primary benchmark, we consider polarized photon fusion in the $e^+ e^-...
Fang Xu
IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages
Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustnes...
Priyaranjan Pattnayak, Sanchari Chowdhuri
Learning under noisy supervision is governed by a feedback-truth gap
When feedback is absorbed faster than task structure can be evaluated, the learner will favor feedback over truth. A two-timescale model shows this feedback-truth gap is inevitable whenever the two...
Elan Schonfeld, Elias Wisnia
Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable Guarantees
*Automated circuit discovery* is a central tool in mechanistic interpretability for identifying the internal components of neural networks responsible for specific behaviors. While prior methods ha...
Itamar Hadad, Guy Katz, Shahaf Bassan
Hybrid-Gym: Training Coding Agents to Generalize Across Tasks
When assessing the quality of coding agents, predominant benchmarks focus on solving single issues on GitHub, such as SWE-Bench. In contrast, in real use, these agents solve more various and comple...
Yiqing Xie, Emmy Liu, Gaokai Zhang, Nachiket Kotalwar, Shubham Gandhi, Sathwik Acharya, Xingyao W...