Personal Assistant Web

TESTING

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical r...

Yiran Guan, Liang Yin, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Jian Luan, Yuliang Liu, Xiang Bai

2603.12262 • 2026-03-12

View PDF

TESTING

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Humans perceive and understand real-world spaces through a stream of visual observations. Therefore, the ability to streamingly maintain and update spatial evidence from potentially unbounded video...

Fangfu Liu, Diankun Wu, Jiawei Chi, Yimo Cai, Yi-Hsin Hung, Xumin Yu, Hao Li, Han Hu, Yongming Ra...

2603.12255 • 2026-03-12

View PDF

TESTING

Thermalisation as Diffusion in Hilbert Space

We develop a microscopic theory of thermalisation for a thermometer coupled to a many-body bath beyond standard Markovian and Fermi-golden-rule assumptions. By modeling interaction matrix elements ...

Aleksey Lunkin

2603.12234 • 2026-03-12

View PDF

TESTING

Incremental Neural Network Verification via Learned Conflicts

Neural network verification is often used as a core component within larger analysis procedures, which generate sequences of closely related verification queries over the same network. In existing ...

Raya Elsaleh, Liam Davis, Haoze Wu, Guy Katz

2603.12232 • 2026-03-12

View PDF

TESTING

Language Model Teams as Distributed Systems

Large language models (LLMs) are growing increasingly capable, prompting recent interest in LLM teams. Yet, despite increased deployment of LLM teams at scale, we lack a principled framework for ad...

Elizabeth Mieczkowski, Katherine M. Collins, Ilia Sucholutsky, Natalia Vélez, Thomas L. Griffiths

2603.12229 • 2026-03-12

View PDF

TESTING

Conformalized Data-Driven Reachability Analysis with PAC Guarantees

Data-driven reachability analysis computes over-approximations of reachable sets directly from noisy data. Existing deterministic methods require either known noise bounds or system-specific struct...

Yanliang Huang, Zhen Zhang, Peng Xie, Zhuoqi Zeng, Amr Alanwar

2603.12220 • 2026-03-12

View PDF

TESTING

A blended approach for evolving phase fields using peridynamics: Cyclic loading in quasi-brittle fracture

A field theory is presented for predicting damage and fracture in quasi brittle materials incorporating effects of irreversible (plastic) deformation as well as elastic moduli that soften with dama...

Hayden Bromley, Robert Lipton

2603.12210 • 2026-03-12

View PDF

TESTING

Shifted-geodesic approximation for spinning-body gravitational wave fluxes

We present a shifted-geodesic framework for computing gravitational-wave fluxes from spinning test bodies moving on bound orbits of Kerr black holes. The method provides a simple and efficient mean...

Lisa V. Drummond, Scott A. Hughes, Viktor Skoupý, Philip Lynch, Gabriel Andres Piovano

2603.12189 • 2026-03-12

View PDF

TESTING

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Multimodal agents offer a promising path to automating complex document-intensive workflows. Yet, a critical question remains: do these agents demonstrate genuine strategic reasoning, or merely sto...

Łukasz Borchmann, Jordy Van Landeghem, Michał Turski, Shreyansh Padarha, Ryan Othniel Kearns, Ada...

2603.12180 • 2026-03-12

View PDF

TESTING

Linking Perception, Confidence and Accuracy in MLLMs

Recent advances in Multi-modal Large Language Models (MLLMs) have predominantly focused on enhancing visual perception to improve accuracy. However, a critical question remains unexplored: Do model...

Yuetian Du, Yucheng Wang, Rongyu Zhang, Zhijie Xu, Boyu Yang, Ming Kong, Jie Liu, Qiang Zhu

2603.12149 • 2026-03-12

View PDF

TESTING

Automatic Generation of High-Performance RL Environments

Translating complex reinforcement learning (RL) environments into high-performance implementations has traditionally required months of specialized engineering. We present a reusable recipe - a gen...

Seth Karten, Rahul Dev Appapogu, Chi Jin

2603.12145 • 2026-03-12

View PDF

TESTING

TopoBench: Benchmarking LLMs on Hard Topological Reasoning

Solving topological grid puzzles requires reasoning over global spatial invariants such as connectivity, loop closure, and region symmetry and remains challenging for even the most powerful large l...

Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresqu...

2603.12133 • 2026-03-12

View PDF

TESTING

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforwa...

Tae-Eun Song

2603.12123 • 2026-03-12

View PDF

TESTING

CRAFT: A Tendon-Driven Hand with Hybrid Hard-Soft Compliance

We introduce CRAFT hand, a tendon-driven anthropomorphic hand with hybrid hard-soft compliance for contact-rich manipulation. The design is based on a simple idea: contact is not uniform across the...

Leo Lin, Shivansh Patel, Jay Moon, Svetlana Lazebnik, Unnat Jain

2603.12120 • 2026-03-12

View PDF

TESTING

SommBench: Assessing Sommelier Expertise of Language Models

With the rapid advances of large language models, it becomes increasingly important to systematically evaluate their multilingual and multicultural capabilities. Previous cultural evaluation benchm...

William Brach, Tomas Bedej, Jacob Nielsen, Jacob Pichna, Juraj Bedej, Eemeli Saarensilta, Julie D...

2603.12117 • 2026-03-12

View PDF

TESTING

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional...

Haodong Zhao, Jinming Hu, Yijie Bai, Tian Dong, Wei Du, Zhuosheng Zhang, Yanjiao Chen, Haojin Zhu...

2603.12089 • 2026-03-12

View PDF

TESTING

Direct Boltzmann inversion method from particle configurations at arbitrary state points

We introduce a direct Boltzmann inversion method to infer the interaction potential in particle systems using as input particle configurations generated at an arbitrary state point of the system. U...

Olivier Coquand, Davide Paolino, Ludovic Berthier

2603.12081 • 2026-03-12

View PDF

TESTING

Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments

Longitudinal brain MRI is essential for characterizing the progression of neurological diseases such as Alzheimer's disease assessment. However, current deep-learning tools fragment this process: c...

Zhaoyang Jiang, Zhizhong Fu, David McAllister, Yunsoo Kim, Honghan Wu

2603.12071 • 2026-03-12

View PDF

TESTING

Numerical benchmark for damage identification in Structural Health Monitoring

The availability of a dataset for validation and verification purposes of novel data-driven strategies and/or hybrid physics-data approaches is currently one of the most pressing challenges in the ...

Francesca Marafini, Giacomo Zini, Alberto Barontini, Nuno Mendes, Alice Cicirello, Michele Betti,...

2603.12069 • 2026-03-12

View PDF

TESTING

Translationese as a Rational Response to Translation Task Difficulty

Translations systematically diverge from texts originally produced in the target language, a phenomenon widely referred to as translationese. Translationese has been attributed to production tenden...

Maria Kunilovskaya

2603.12050 • 2026-03-12

View PDF

Papers