Papers
Research papers from arXiv and related sources
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
The rapid advancement of Multimodal Large Language Models (MLLMs) has enabled browsing agents to acquire and reason over multimodal information in the real world. But existing benchmarks suffer fro...
Zhengbo Zhang, Jinbo Su, Zhaowen Zhou, Changtao Miao, Yuhan Hong, Qimeng Wu, Yumeng Liu, Feier Wu...
VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment
Video diffusion models lack explicit geometric supervision during training, leading to inconsistency artifacts such as object deformation, spatial drift, and depth violations in generated videos. T...
Tengjiao Yin, Jinglei Shi, Heng Guo, Xi Wang
Industrial cuVSLAM Benchmark & Integration
This work presents a comprehensive benchmark evaluation of visual odometry (VO) and visual SLAM (VSLAM) systems for mobile robot navigation in real-world logistical environments. We compare multipl...
Charbel Abi Hana, Kameel Amareen, Mohamad Mostafa, Dmitry Slepichev, Hesam Rabeti, Zheng Wang, Mi...
Neural Pushforward Samplers for the Fokker-Planck Equation on Embedded Riemannian Manifolds
We extend the Weak Adversarial Neural Pushforward (WANPF) Method to the Fokker--Planck equation posed on a compact, smoothly embedded Riemannian manifold M in $R^n$. The key observation is that the...
Andrew Qing He, Wei Cai
SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation
Realizing personalized intelligence faces a core dilemma: sending user history to centralized large language models raises privacy concerns, while on-device small language models lack the reasoning...
Hang Lv, Sheng Liang, Hao Wang, Yongyue Zhang, Hongchao Gu, Wei Guo, Defu Lian, Yong Liu, Enhong ...
Equivalence testing with data-dependent and post-hoc equivalence margins
Equivalence testing compares the hypothesis that an effect $μ$ is large against the alternative that it is negligible. Here, `large' is classically expressed as being larger than some `equivalence ...
Stan Koobs, Nick W. Koning
Rapid Worst-Case Gust Identification for Very Flexible Aircraft Using Reduced-Order Models
Identification of worst-case gust loads is a critical step in the certification of very flexible aircraft, yet the computational cost of nonlinear full-order simulations renders exhaustive parametr...
Nikolaos D. Tantaroudas, Andrea Da Ronch, Ilias Karachalios, Kenneth J. Badcock
Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation
Feed-forward 3D reconstruction has revolutionized 3D vision, providing a powerful baseline for downstream tasks such as novel-view synthesis with 3D Gaussian Splatting. Previous works explore fixin...
Yiming Huang, Baixiang Huang, Beilei Cui, Chi Kit Ng, Long Bai, Hongliang Ren
Weak Adversarial Neural Pushforward Method for the McKean-Vlasov / Mean-Field Fokker-Planck Equation
We extend the Weak Adversarial Neural Pushforward Method (WANPM) to the McKean-Vlasov mean-field Fokker-Planck equation. For the quadratic interaction kernel, the mean-field nonlinearity reduces to...
Andrew Qing He, Wei Cai
Homogeneous and Heterogeneous Consistency progressive Re-ranking for Visible-Infrared Person Re-identification
Visible-infrared person re-identification faces greater challenges than traditional person re-identification due to the significant differences between modalities. In particular, the differences be...
Yiming Wang
Execution-Grounded Credit Assignment for GRPO in Code Generation
Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single ...
Abhijit Kumar, Natalya Kumar, Shikhar Gupta
Dialect-Agnostic SQL Parsing via LLM-Based Segmentation
SQL is a widely adopted language for querying data, which has led to the development of various SQL analysis and rewriting tools. However, due to the diversity of SQL dialects, such tools often fai...
Junwen An, Kabilan Mahathevan, Manuel Rigger
Mechanical anisotropy of 3D-printed digital materials at large strains
3D-printed digital materials whose mechanical behavior travels between those from thermoplastic to rubbery polymers have become increasingly important. However, their mechanical functionalities hav...
Seunghwan Lee, Gisoo Lee, Seounghee Yun, Sumin Lee, Jeonyoon Lee, Hansohl Cho
Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards
Reinforcement learning with verifiable rewards (RLVR) has driven recent capability advances of large language models across various domains. Recent studies suggest that improved RLVR algorithms all...
Yuxuan Zhu, Daniel Kang
SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment
Large language models offer transformative potential for e-commerce search by enabling intent-aware recommendations. However, their industrial deployment is hindered by two critical challenges: (1)...
Zhouwei Zhai, Mengxiang Chen, Anmeng Zhang
Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
Deep Research (DR) tools (e.g. OpenAI DR) help researchers cope with ballooning publishing counts. Such tools can synthesize scientific papers to answer researchers' queries, but lack understanding...
Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue...
CounterRefine: Answer-Conditioned Counterevidence Retrieval for Inference-Time Knowledge Repair in Factual Question Answering
In factual question answering, many errors are not failures of access but failures of commitment: the system retrieves relevant evidence, yet still settles on the wrong answer. We present CounterRe...
Tianyi Huang, Ying Kai Deng
Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation
While recent Vision-Language-Action (VLA) models have begun to incorporate audio, they typically treat sound as static pre-execution prompts or focus exclusively on human speech. This leaves a sign...
Chang Nie, Tianchen Deng, Guangming Wang, Zhe Liu, Hesheng Wang
SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia
Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms d...
Ri Chi Ng, Aditi Kumaresan, Yujia Hu, Roy Ka-Wei Lee
Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models
Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward f...
Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao, Yue Wang