Papers
Research papers from arXiv and related sources
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical r...
Yiran Guan, Liang Yin, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Jian Luan, Yuliang Liu, Xiang Bai
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
Humans perceive and understand real-world spaces through a stream of visual observations. Therefore, the ability to streamingly maintain and update spatial evidence from potentially unbounded video...
Fangfu Liu, Diankun Wu, Jiawei Chi, Yimo Cai, Yi-Hsin Hung, Xumin Yu, Hao Li, Han Hu, Yongming Ra...
Thermalisation as Diffusion in Hilbert Space
We develop a microscopic theory of thermalisation for a thermometer coupled to a many-body bath beyond standard Markovian and Fermi-golden-rule assumptions. By modeling interaction matrix elements ...
Aleksey Lunkin
Incremental Neural Network Verification via Learned Conflicts
Neural network verification is often used as a core component within larger analysis procedures, which generate sequences of closely related verification queries over the same network. In existing ...
Raya Elsaleh, Liam Davis, Haoze Wu, Guy Katz
Language Model Teams as Distributed Systems
Large language models (LLMs) are growing increasingly capable, prompting recent interest in LLM teams. Yet, despite increased deployment of LLM teams at scale, we lack a principled framework for ad...
Elizabeth Mieczkowski, Katherine M. Collins, Ilia Sucholutsky, Natalia Vélez, Thomas L. Griffiths
Conformalized Data-Driven Reachability Analysis with PAC Guarantees
Data-driven reachability analysis computes over-approximations of reachable sets directly from noisy data. Existing deterministic methods require either known noise bounds or system-specific struct...
Yanliang Huang, Zhen Zhang, Peng Xie, Zhuoqi Zeng, Amr Alanwar
A blended approach for evolving phase fields using peridynamics: Cyclic loading in quasi-brittle fracture
A field theory is presented for predicting damage and fracture in quasi brittle materials incorporating effects of irreversible (plastic) deformation as well as elastic moduli that soften with dama...
Hayden Bromley, Robert Lipton
Shifted-geodesic approximation for spinning-body gravitational wave fluxes
We present a shifted-geodesic framework for computing gravitational-wave fluxes from spinning test bodies moving on bound orbits of Kerr black holes. The method provides a simple and efficient mean...
Lisa V. Drummond, Scott A. Hughes, Viktor Skoupý, Philip Lynch, Gabriel Andres Piovano
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Multimodal agents offer a promising path to automating complex document-intensive workflows. Yet, a critical question remains: do these agents demonstrate genuine strategic reasoning, or merely sto...
Łukasz Borchmann, Jordy Van Landeghem, Michał Turski, Shreyansh Padarha, Ryan Othniel Kearns, Ada...
Linking Perception, Confidence and Accuracy in MLLMs
Recent advances in Multi-modal Large Language Models (MLLMs) have predominantly focused on enhancing visual perception to improve accuracy. However, a critical question remains unexplored: Do model...
Yuetian Du, Yucheng Wang, Rongyu Zhang, Zhijie Xu, Boyu Yang, Ming Kong, Jie Liu, Qiang Zhu
Automatic Generation of High-Performance RL Environments
Translating complex reinforcement learning (RL) environments into high-performance implementations has traditionally required months of specialized engineering. We present a reusable recipe - a gen...
Seth Karten, Rahul Dev Appapogu, Chi Jin
TopoBench: Benchmarking LLMs on Hard Topological Reasoning
Solving topological grid puzzles requires reasoning over global spatial invariants such as connectivity, loop closure, and region symmetry and remains challenging for even the most powerful large l...
Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresqu...
Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions
Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforwa...
Tae-Eun Song
CRAFT: A Tendon-Driven Hand with Hybrid Hard-Soft Compliance
We introduce CRAFT hand, a tendon-driven anthropomorphic hand with hybrid hard-soft compliance for contact-rich manipulation. The design is based on a simple idea: contact is not uniform across the...
Leo Lin, Shivansh Patel, Jay Moon, Svetlana Lazebnik, Unnat Jain
SommBench: Assessing Sommelier Expertise of Language Models
With the rapid advances of large language models, it becomes increasingly important to systematically evaluate their multilingual and multicultural capabilities. Previous cultural evaluation benchm...
William Brach, Tomas Bedej, Jacob Nielsen, Jacob Pichna, Juraj Bedej, Eemeli Saarensilta, Julie D...
EmbTracker: Traceable Black-box Watermarking for Federated Language Models
Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional...
Haodong Zhao, Jinming Hu, Yijie Bai, Tian Dong, Wei Du, Zhuosheng Zhang, Yanjiao Chen, Haojin Zhu...
Direct Boltzmann inversion method from particle configurations at arbitrary state points
We introduce a direct Boltzmann inversion method to infer the interaction potential in particle systems using as input particle configurations generated at an arbitrary state point of the system. U...
Olivier Coquand, Davide Paolino, Ludovic Berthier
Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments
Longitudinal brain MRI is essential for characterizing the progression of neurological diseases such as Alzheimer's disease assessment. However, current deep-learning tools fragment this process: c...
Zhaoyang Jiang, Zhizhong Fu, David McAllister, Yunsoo Kim, Honghan Wu
Numerical benchmark for damage identification in Structural Health Monitoring
The availability of a dataset for validation and verification purposes of novel data-driven strategies and/or hybrid physics-data approaches is currently one of the most pressing challenges in the ...
Francesca Marafini, Giacomo Zini, Alberto Barontini, Nuno Mendes, Alice Cicirello, Michele Betti,...
Translationese as a Rational Response to Translation Task Difficulty
Translations systematically diverge from texts originally produced in the target language, a phenomenon widely referred to as translationese. Translationese has been attributed to production tenden...
Maria Kunilovskaya