Personal Assistant Web

TESTING

VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents

The rapid advancement of Multimodal Large Language Models (MLLMs) has enabled browsing agents to acquire and reason over multimodal information in the real world. But existing benchmarks suffer fro...

Zhengbo Zhang, Jinbo Su, Zhaowen Zhou, Changtao Miao, Yuhan Hong, Qimeng Wu, Yumeng Liu, Feier Wu...

2603.16289 • 2026-03-17

View PDF

TESTING

VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment

Video diffusion models lack explicit geometric supervision during training, leading to inconsistency artifacts such as object deformation, spatial drift, and depth violations in generated videos. T...

Tengjiao Yin, Jinglei Shi, Heng Guo, Xi Wang

2603.16271 • 2026-03-17

View PDF

TESTING

Industrial cuVSLAM Benchmark & Integration

This work presents a comprehensive benchmark evaluation of visual odometry (VO) and visual SLAM (VSLAM) systems for mobile robot navigation in real-world logistical environments. We compare multipl...

Charbel Abi Hana, Kameel Amareen, Mohamad Mostafa, Dmitry Slepichev, Hesam Rabeti, Zheng Wang, Mi...

2603.16240 • 2026-03-17

View PDF

TESTING

Neural Pushforward Samplers for the Fokker-Planck Equation on Embedded Riemannian Manifolds

We extend the Weak Adversarial Neural Pushforward (WANPF) Method to the Fokker--Planck equation posed on a compact, smoothly embedded Riemannian manifold M in $R^n$. The key observation is that the...

Andrew Qing He, Wei Cai

2603.16239 • 2026-03-17

View PDF

TESTING

SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation

Realizing personalized intelligence faces a core dilemma: sending user history to centralized large language models raises privacy concerns, while on-device small language models lack the reasoning...

Hang Lv, Sheng Liang, Hao Wang, Yongyue Zhang, Hongchao Gu, Wei Guo, Defu Lian, Yong Liu, Enhong ...

2603.16219 • 2026-03-17

View PDF

TESTING

Equivalence testing with data-dependent and post-hoc equivalence margins

Equivalence testing compares the hypothesis that an effect $μ$ is large against the alternative that it is negligible. Here, `large' is classically expressed as being larger than some `equivalence ...

Stan Koobs, Nick W. Koning

2603.16213 • 2026-03-17

View PDF

TESTING

Rapid Worst-Case Gust Identification for Very Flexible Aircraft Using Reduced-Order Models

Identification of worst-case gust loads is a critical step in the certification of very flexible aircraft, yet the computational cost of nonlinear full-order simulations renders exhaustive parametr...

Nikolaos D. Tantaroudas, Andrea Da Ronch, Ilias Karachalios, Kenneth J. Badcock

2603.16212 • 2026-03-17

View PDF

TESTING

Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation

Feed-forward 3D reconstruction has revolutionized 3D vision, providing a powerful baseline for downstream tasks such as novel-view synthesis with 3D Gaussian Splatting. Previous works explore fixin...

Yiming Huang, Baixiang Huang, Beilei Cui, Chi Kit Ng, Long Bai, Hongliang Ren

2603.16211 • 2026-03-17

View PDF

TESTING

Weak Adversarial Neural Pushforward Method for the McKean-Vlasov / Mean-Field Fokker-Planck Equation

We extend the Weak Adversarial Neural Pushforward Method (WANPM) to the McKean-Vlasov mean-field Fokker-Planck equation. For the quadratic interaction kernel, the mean-field nonlinearity reduces to...

Andrew Qing He, Wei Cai

2603.16186 • 2026-03-17

View PDF

TESTING

Homogeneous and Heterogeneous Consistency progressive Re-ranking for Visible-Infrared Person Re-identification

Visible-infrared person re-identification faces greater challenges than traditional person re-identification due to the significant differences between modalities. In particular, the differences be...

Yiming Wang

2603.16165 • 2026-03-17

View PDF

TESTING

Execution-Grounded Credit Assignment for GRPO in Code Generation

Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single ...

Abhijit Kumar, Natalya Kumar, Shikhar Gupta

2603.16158 • 2026-03-17

View PDF

TESTING

Dialect-Agnostic SQL Parsing via LLM-Based Segmentation

SQL is a widely adopted language for querying data, which has led to the development of various SQL analysis and rewriting tools. However, due to the diversity of SQL dialects, such tools often fai...

Junwen An, Kabilan Mahathevan, Manuel Rigger

2603.16155 • 2026-03-17

View PDF

TESTING

Mechanical anisotropy of 3D-printed digital materials at large strains

3D-printed digital materials whose mechanical behavior travels between those from thermoplastic to rubbery polymers have become increasingly important. However, their mechanical functionalities hav...

Seunghwan Lee, Gisoo Lee, Seounghee Yun, Sumin Lee, Jeonyoon Lee, Hansohl Cho

2603.16149 • 2026-03-17

View PDF

TESTING

Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards

Reinforcement learning with verifiable rewards (RLVR) has driven recent capability advances of large language models across various domains. Recent studies suggest that improved RLVR algorithms all...

Yuxuan Zhu, Daniel Kang

2603.16140 • 2026-03-17

View PDF

TESTING

SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment

Large language models offer transformative potential for e-commerce search by enabling intent-aware recommendations. However, their industrial deployment is hindered by two critical challenges: (1)...

Zhouwei Zhai, Mengxiang Chen, Anmeng Zhang

2603.16137 • 2026-03-17

View PDF

TESTING

Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users

Deep Research (DR) tools (e.g. OpenAI DR) help researchers cope with ballooning publishing counts. Such tools can synthesize scientific papers to answer researchers' queries, but lack understanding...

Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue...

2603.16120 • 2026-03-17

View PDF

TESTING

CounterRefine: Answer-Conditioned Counterevidence Retrieval for Inference-Time Knowledge Repair in Factual Question Answering

In factual question answering, many errors are not failures of access but failures of commitment: the system retrieves relevant evidence, yet still settles on the wrong answer. We present CounterRe...

Tianyi Huang, Ying Kai Deng

2603.16091 • 2026-03-17

View PDF

TESTING

Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation

While recent Vision-Language-Action (VLA) models have begun to incorporate audio, they typically treat sound as static pre-execution prompts or focus exclusively on human speech. This leaves a sign...

Chang Nie, Tianchen Deng, Guangming Wang, Zhe Liu, Hesheng Wang

2603.16086 • 2026-03-17

View PDF

TESTING

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms d...

Ri Chi Ng, Aditi Kumaresan, Yujia Hu, Roy Ka-Wei Lee

2603.16070 • 2026-03-17

View PDF

TESTING

Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward f...

Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao, Yue Wang

2603.16065 • 2026-03-17

View PDF

Papers