Papers
Research papers from arXiv and related sources
A Kernel Two-Sample Test Invariant under Group Action with Applications to Functional Data
We introduce a kernel-based two-sample test for comparing probability distributions up to group actions. Our construction yields invariant kernels for locally compact $σ$-compact groups and extends...
Madison Giacofci, Anouar Meynaoui, Alex Podgorny
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
The rapid advancement of Multimodal Large Language Models (MLLMs) has enabled browsing agents to acquire and reason over multimodal information in the real world. But existing benchmarks suffer fro...
Zhengbo Zhang, Jinbo Su, Zhaowen Zhou, Changtao Miao, Yuhan Hong, Qimeng Wu, Yumeng Liu, Feier Wu...
CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS
Current Text-to-Speech (TTS) systems typically use separate models for speech-prompted and text-prompted timbre control. While unifying both control signals into a single model is desirable, the ch...
Zihao Zheng, Wen Wu, Chao Zhang, Mengyue Wu, Xuenan Xu
VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment
Video diffusion models lack explicit geometric supervision during training, leading to inconsistency artifacts such as object deformation, spatial drift, and depth violations in generated videos. T...
Tengjiao Yin, Jinglei Shi, Heng Guo, Xi Wang
Adaptive Theory of Mind for LLM-based Multi-Agent Coordination
Theory of Mind (ToM) refers to the ability to reason about others' mental states, and higher-order ToM involves considering that others also possess their own ToM. Equipping large language model (L...
Chunjiang Mu, Ya Zeng, Qiaosheng Zhang, Kun Shao, Chen Chu, Hao Guo, Danyang Jia, Zhen Wang, Shuy...
Human/AI Collective Intelligence for Deliberative Democracy: A Human-Centred Design Approach
This chapter introduces the concept of Collective Intelligence for Deliberative Democracy (CI4DD). We propose that the use of computational tools, specifically artificial intelligence to advance de...
Anna De Liddo, Lucas Anastasiou, Simon Buckingham Shum
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
Recently, Multimodal Large Language Models (MLLMs) have demonstrated significant potential in complex visual tasks through the integration of Chain-of-Thought (CoT) reasoning. However, in Video Que...
Xiaokun Sun, Yubo Wang, Haoyu Cao, Linli Xu
Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models
Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time scaling. However, they often function as black-box ...
Junxin Wang, Dai Guan, Weijie Qiu, Zhihang Li, Yongbo Gai, Zhengyi Yang, Mengyu Zhou, Erchao Zhao...
Visual Prompt Discovery via Semantic Exploration
LVLMs encounter significant challenges in image understanding and visual reasoning, leading to critical perception failures. Visual prompts, which incorporate image manipulation code, have shown pr...
Jaechang Kim, Yotaro Shimose, Zhao Wang, Kuang-Da Wang, Jungseul Ok, Shingo Takamatsu
How to Utilize Complementary Vision-Text Information for 2D Structure Understanding
LLMs typically linearize 2D tables into 1D sequences to fit their autoregressive architecture, which weakens row-column adjacency and other layout cues. In contrast, purely visual encoders can capt...
Jiancheng Dong, Pengyue Jia, Derong Xu, Jiawei Cheng, Jingyu Peng, Chao Zhang, Bowen Liu, Xin Sun...
More Rounds, More Noise: Why Multi-Turn Review Fails to Improve Cross-Context Verification
Cross-Context Review (CCR) improves LLM verification by separating production and review into independent sessions. A natural extension is multi-turn review: letting the reviewer ask follow-up ques...
Song Tae-Eun
Industrial cuVSLAM Benchmark & Integration
This work presents a comprehensive benchmark evaluation of visual odometry (VO) and visual SLAM (VSLAM) systems for mobile robot navigation in real-world logistical environments. We compare multipl...
Charbel Abi Hana, Kameel Amareen, Mohamad Mostafa, Dmitry Slepichev, Hesam Rabeti, Zheng Wang, Mi...
Neural Pushforward Samplers for the Fokker-Planck Equation on Embedded Riemannian Manifolds
We extend the Weak Adversarial Neural Pushforward (WANPF) Method to the Fokker--Planck equation posed on a compact, smoothly embedded Riemannian manifold M in $R^n$. The key observation is that the...
Andrew Qing He, Wei Cai
SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation
Realizing personalized intelligence faces a core dilemma: sending user history to centralized large language models raises privacy concerns, while on-device small language models lack the reasoning...
Hang Lv, Sheng Liang, Hao Wang, Yongyue Zhang, Hongchao Gu, Wei Guo, Defu Lian, Yong Liu, Enhong ...
Equivalence testing with data-dependent and post-hoc equivalence margins
Equivalence testing compares the hypothesis that an effect $μ$ is large against the alternative that it is negligible. Here, `large' is classically expressed as being larger than some `equivalence ...
Stan Koobs, Nick W. Koning
Rapid Worst-Case Gust Identification for Very Flexible Aircraft Using Reduced-Order Models
Identification of worst-case gust loads is a critical step in the certification of very flexible aircraft, yet the computational cost of nonlinear full-order simulations renders exhaustive parametr...
Nikolaos D. Tantaroudas, Andrea Da Ronch, Ilias Karachalios, Kenneth J. Badcock
Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation
Feed-forward 3D reconstruction has revolutionized 3D vision, providing a powerful baseline for downstream tasks such as novel-view synthesis with 3D Gaussian Splatting. Previous works explore fixin...
Yiming Huang, Baixiang Huang, Beilei Cui, Chi Kit Ng, Long Bai, Hongliang Ren
Weak Adversarial Neural Pushforward Method for the McKean-Vlasov / Mean-Field Fokker-Planck Equation
We extend the Weak Adversarial Neural Pushforward Method (WANPM) to the McKean-Vlasov mean-field Fokker-Planck equation. For the quadratic interaction kernel, the mean-field nonlinearity reduces to...
Andrew Qing He, Wei Cai
Homogeneous and Heterogeneous Consistency progressive Re-ranking for Visible-Infrared Person Re-identification
Visible-infrared person re-identification faces greater challenges than traditional person re-identification due to the significant differences between modalities. In particular, the differences be...
Yiming Wang
Execution-Grounded Credit Assignment for GRPO in Code Generation
Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single ...
Abhijit Kumar, Natalya Kumar, Shikhar Gupta