Personal Assistant Web

TESTING

ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments

As large language models (LLMs) evolve into autonomous agents capable of acting in open-ended environments, ensuring behavioral alignment with human values becomes a critical safety concern. Existi...

Weixiang Zhao, Haozhen Li, Yanyan Zhao, xuda zhi, Yongbo Huang, Hao He, Bing Qin, Ting Liu

2603.08024 • 2026-03-09

View PDF

TESTING

CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval

Although large language models (LLMs) are introduced into vision-and-language navigation (VLN) to improve instruction comprehension and generalization, existing LLM- based VLN lacks the ability to ...

Haozhou Li, Xiangyu Dong, Huiyan Jiang, Yaoming Zhou, Xiaoguang Ma

2603.07997 • 2026-03-09

View PDF

TESTING

MJ1: Multimodal Judgment via Grounded Verification

Multimodal judges struggle to ground decisions in visual evidence. We present MJ1, a multimodal judge trained with reinforcement learning that enforces visual grounding through a structured grounde...

Bhavesh Kumar, Dylan Feng, Leonard Tang

2603.07990 • 2026-03-09

View PDF

TESTING

On the Feasibility and Opportunity of Autoregressive 3D Object Detection

LiDAR-based 3D object detectors typically rely on proposal heads with hand-crafted components like anchor assignment and non-maximum suppression (NMS), complicating training and limiting extensibil...

Zanming Huang, Jinsu Yoo, Sooyoung Jeon, Zhenzhen Liu, Mark Campbell, Kilian Q Weinberger, Bharat...

2603.07985 • 2026-03-09

View PDF

TESTING

ZK-ACE: Identity-Centric Zero-Knowledge Authorization for Post-Quantum Blockchain Systems

Post-quantum signature schemes introduce kilobyte-scale authorization artifacts when applied directly to blockchain transaction validation. A widely considered mitigation is to verify post-quantum ...

Jian Sheng Wang

2603.07974 • 2026-03-09

View PDF

TESTING

Advancing Automated Algorithm Design via Evolutionary Stagewise Design with LLMs

With the rapid advancement of human science and technology, problems in industrial scenarios are becoming increasingly challenging, bringing significant challenges to traditional algorithm design. ...

Chen Lu, Ke Xue, Chengrui Gao, Yunqi Shi, Siyuan Xu, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou

2603.07970 • 2026-03-09

View PDF

TESTING

WeldAR: Augmenting Live Hands-On Training with In-Situ Guidance for Novice Learners

Extended Reality (XR) systems for physical skill training have largely emphasized simulation rather than real-time in-situ instruction. We present WeldAR, an Augmented Reality (AR) system with five...

Chuhan, Xu, Lia Sparingga Purnamasari, Zhenfang Chen, Daragh Byrne, Dina El-Zanfaly

2603.07959 • 2026-03-09

View PDF

TESTING

RL unknotter, hard unknots and unknotting number

We develop a reinforcement learning pipeline for simplifying knot diagrams. A trained agent learns move proposals and a value heuristic for navigating Reidemeister moves. The pipeline applies to ar...

Anne Dranowski, Yura Kabkov, Daniel Tubbenhauer

2603.07955 • 2026-03-09

View PDF

TESTING

ConnChecker: Automated Root-Cause Analysis for Formal Connectivity Check via Graph

Formal connectivity checking offers scalable verification of signal paths in complex SoC designs, but debugging counterexamples remains a manual and time-consuming process. ConnChecker introduces a...

Do Ngoc Tiep, Nguyen Linh Anh, Luu Danh Minh

2603.07943 • 2026-03-09

View PDF

TESTING

Condition-Triggered Cryptographic Asset Control via Dormant Authorization Paths

Control of encrypted digital assets is traditionally equated with permanent possession of private keys, a model that precludes regulatory supervision, conditional delegation, and legally compliant ...

Jian Sheng Wang

2603.07933 • 2026-03-09

View PDF

TESTING

Hard/Soft NLoS Detection via Combinatorial Data Augmentation for 6G Positioning

A key enabler for meeting the stringent requirements of 6G positioning is the ability to exploit site-dependent information governing line-of-sight (LoS) and non-line-of-sight (NLoS) propagation. H...

Sang-Hyeok Kim, Seung Min Yu, Jihong Park, Seung-Woo Ko

2603.07932 • 2026-03-09

View PDF

TESTING

Omnidirectional Humanoid Locomotion on Stairs via Unsafe Stepping Penalty and Sparse LiDAR Elevation Mapping

Humanoid robots, characterized by numerous degrees of freedom and a high center of gravity, are inherently unstable. Safe omnidirectional locomotion on stairs requires both omnidirectional terrain ...

Yuzhi Jiang, Yujun Liang, Junhao Li, Han Ding, Lijun Zhu

2603.07928 • 2026-03-09

View PDF

TESTING

SWE-Fuse: Empowering Software Agents via Issue-free Trajectory Learning and Entropy-aware RLVR Training

Large language models (LLMs) have transformed the software engineering landscape. Recently, numerous LLM-based agents have been developed to address real-world software issue fixing tasks. Despite ...

Xin-Cheng Wen, Binbin Chen, Haoxuan Lan, Hang Yu, Peng Di, Cuiyun Gao

2603.07927 • 2026-03-09

View PDF

TESTING

IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation

Test-time adaptation (TTA) has been widely explored to prevent performance degradation when test data differ from the training distribution. However, fully leveraging the rich representations of la...

Sunghyun Baek, Jaemyung Yu, Seunghee Koh, Minsu Kim, Hyeonseong Jeon, Junmo Kim

2603.07926 • 2026-03-09

View PDF

TESTING

Beyond Heuristic Prompting: A Concept-Guided Bayesian Framework for Zero-Shot Image Recognition

Vision-Language Models (VLMs), such as CLIP, have significantly advanced zero-shot image recognition. However, their performance remains limited by suboptimal prompt engineering and poor adaptabili...

Hui Liu, Kecheng Chen, Jialiang Wang, Xianming Liu, Wenya Wang, Haoliang Li

2603.07911 • 2026-03-09

View PDF

TESTING

Effective and flexible depth-based inference for functional parameters

For hypothesis testing of functional parameters, given a functional statistic $T_n$ and a functional depth $D$ with respect to the distribution $P_n$ of $T_n$, we propose the depth value $DT_n \equ...

Hyemin Yeon

2603.07871 • 2026-03-09

View PDF

TESTING

Structural aging of a cohesive and amorphous granular solid under cyclic loading

We investigate how cyclic loading evolves the structure and deformation behaviors of a granular raft composed of particles floating at an air-oil interface. The raft has a disordered particle packi...

William Hobson-Rhoades, Douglas J Durian, Yue Fan, Hongyi Xiao

2603.07852 • 2026-03-09

View PDF

TESTING

A Lock-Free, Fully GPU-Resident Architecture for the Verification of Goldbach's Conjecture

We present a fully device-resident, multi-GPU architecture for the large-scale computational verification of Goldbach's conjecture. In prior work, a segmented double-sieve eliminated monolithic VRA...

Isaac Llorente-Saguer

2603.07850 • 2026-03-08

View PDF

TESTING

Intentional Deception as Controllable Capability in LLM Agents

As LLM-based agents increasingly operate in multi-agent systems, understanding adversarial manipulation becomes critical for defensive design. We present a systematic study of intentional deception...

Jason Starace, Terence Soule

2603.07848 • 2026-03-08

View PDF

TESTING

New results and tests for stochastic dominance between linear combinations

Convex combinations of i.i.d. random variables without a finite mean can behave in a strikingly different way from the finite-mean case: as the weight vector becomes more balanced, the resulting co...

Tommaso Lando, Paulo Eduardo Oliveira

2603.07842 • 2026-03-08

View PDF

Papers