Personal Assistant Web

TESTING

To Agree or To Be Right? The Grounding-Sycophancy Tradeoff in Medical Vision-Language Models

Vision-language models (VLMs) adapted to the medical domain have shown strong performance on visual question answering benchmarks, yet their robustness against two critical failure modes, hallucina...

OFM Riaz Rahman Aranya, Kevin Desai

2603.22623 • 2026-03-23

View PDF

TESTING

A Vision Language Model for Generating Procedural Plant Architecture Representations from Simulated Images

Three-dimensional (3D) procedural plant architecture models have emerged as an important tool for simulation-based studies of plant structure and function, extracting plant architectural parameters...

Heesup Yun, Isaac Kazuo Uyehara, Ioannis Droutsas, Earl Ranario, Christine H. Diepenbrock, Brian ...

2603.22622 • 2026-03-23

View PDF

TESTING

Wave-particle equilibria with heavy ions in weakly collisional space plasmas

Space plasmas are weakly collisional since characteristic time scales related to Coulomb collisions are much larger than those of Larmor gyration or wave--particle interactions. Thus, wave activity...

Nicolás Villarroel-Sepúlveda, Daniel Verscharen, Pablo S. Moya, Rodrigo A. López, Kristopher G. K...

2603.22613 • 2026-03-23

View PDF

TESTING

BioShield: A Context-Aware Firewall for Securing Bio-LLMs

The rapid advancement of Large Language Models (LLMs) in biological research has significantly lowered the barrier to accessing complex bioinformatics knowledge, ex perimental design strategies, an...

Protiva Das, Sovon Chakraborty, Sidhant Narula, Lucas Potter, Xavier-Lewis Palmer, Pratip Rana, D...

2603.22612 • 2026-03-23

View PDF

TESTING

Dress-ED: Instruction-Guided Editing for Virtual Try-On and Try-Off

Recent advances in Virtual Try-On (VTON) and Virtual Try-Off (VTOFF) have greatly improved photo-realistic fashion synthesis and garment reconstruction. However, existing datasets remain static, la...

Fulvio Sanguigni, Davide Lobba, Bin Ren, Marcella Cornia, Nicu Sebe, Rita Cucchiara

2603.22607 • 2026-03-23

View PDF

TESTING

Making Effective Statistical Inferences: From Significance Testing to the Open Science Inference Ecosystem (2016-2026)

Statistical inference has undergone a profound transformation over the past decade, evolving from a significance-testing paradigm toward a comprehensive, transparency-driven framework embedded with...

Aswini Kumar Patra

2603.22594 • 2026-03-23

View PDF

TESTING

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Chain-of-thought (CoT) reasoning has been proposed as a transparency mechanism for large language models in safety-critical deployments, yet its effectiveness depends on faithfulness (whether model...

Richard J. Young

2603.22582 • 2026-03-23

View PDF

TESTING

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

Large Language Models (LLMs) have demonstrated potential in code generation, yet they struggle with the multi-step, stateful reasoning required for offensive cybersecurity operations. Existing rese...

James Hugglestone, Samuel Jacob Chacko, Dawson Stoller, Ryan Schmidt, Xiuwen Liu

2603.22577 • 2026-03-23

View PDF

TESTING

GIFT: Generalizing Intent for Flexible Test-Time Rewards

Robots learn reward functions from user demonstrations, but these rewards often fail to generalize to new environments. This failure occurs because learned rewards latch onto spurious correlations ...

Fin Amin, Nathaniel Dennler, Andreea Bobu

2603.22574 • 2026-03-23

View PDF

TESTING

Proxy-Reliance Control in Conformal Recalibration of One-Sided Value-at-Risk

We introduce a proxy-reliance-controlled conformal recalibration framework for one-sided Value-at-Risk (VaR), and study a question that existing state-aware methods do not usually isolate: how stro...

Tenghan Zhong

2603.22569 • 2026-03-23

View PDF

TESTING

TrustTrade: Human-Inspired Selective Consensus Reduces Decision Uncertainty in LLM Trading Agents

Large language models (LLMs) are increasingly deployed as autonomous agents in financial trading. However, they often exhibit a hazardous behavioral bias that we term uniform trust, whereby retriev...

Minghan Li, Rachel Gonsalves, Weiyue Li, Sunghoon Yoon, Mengyu Wang

2603.22567 • 2026-03-23

View PDF

TESTING

AI Mental Models: Learned Intuition and Deliberation in a Bounded Neural Architecture

This paper asks whether a bounded neural architecture can exhibit a meaningful division of labor between intuition and deliberation on a classic 64-item syllogistic reasoning benchmark. More broadl...

Laurence Anthony

2603.22561 • 2026-03-23

View PDF

TESTING

Vibrissa inspired geometries enhance sensitivity of wake-induced vibrations

We report on experiments designed to characterize the vortex-induced vibration (VIV) and wake-induced vibration (WIV) experienced by bluff bodies immersed in both steady and unsteady flows. Using a...

Eva Erickson, Eric E. Handy-Cardenas, Joel W. Newbolt, Christin Murphy, Kenneth Breuer

2603.22556 • 2026-03-23

View PDF

TESTING

Generalized multi-object classification and tracking with sparse feature resonator networks

In visual scene understanding tasks, it is essential to capture both invariant and equivariant structure. While neural networks are frequently trained to achieve invariance to transformations such ...

Lazar Supic, Alec Mullen, E. Paxon Frady

2603.22539 • 2026-03-23

View PDF

TESTING

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Long video understanding remains challenging for multimodal large language models (MLLMs) due to limited context windows, which necessitate identifying sparse query-relevant video segments. However...

Ruoliu Yang, Chu Wu, Caifeng Shan, Ran He, Chaoyou Fu

2603.22285 • 2026-03-23

View PDF

TESTING

Precision's arrow of time

The arrow of time is usually attributed to two mechanisms: decoherence through environmental entanglement, and chaos through nonlinear dynamics. Here we demonstrate a third route, Precision-Induced...

Luis E. F. Foa Torres, G. Pappas, V. Achilleos, D. Bautista Avilés

2603.22284 • 2026-03-23

View PDF

AI LLM

UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation

We present UniMotion, to our knowledge the first unified framework for simultaneous understanding and generation of human motion, natural language, and RGB images within a single architecture. Exis...

Ziyi Wang, Xinshun Wang, Shuang Chen, Yang Cong, Mengyuan Liu

2603.22282 • 2026-03-23

View PDF

AI LLM

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing

Large Language Models (LLMs) and Vision Language Models (VLMs) have shown impressive reasoning abilities, yet they struggle with spatial understanding and layout consistency when performing fine-gr...

Haoyu Zhen, Xiaolong Li, Yilin Zhao, Han Zhang, Sifei Liu, Kaichun Mo, Chuang Gan, Subhashree Rad...

2603.22279 • 2026-03-23

View PDF

TESTING

DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming

Performing in-hand, contact-rich, and long-horizon dexterous manipulation remains an unsolved challenge in robotics. Prior hand dexterity works have considered each of these three challenges in iso...

Hung-Chieh Fang, Amber Xie, Jennifer Grannen, Kenneth Llontop, Dorsa Sadigh

2603.22263 • 2026-03-23

View PDF

AI LLM

Greater accessibility can amplify discrimination in generative AI

Hundreds of millions of people rely on large language models (LLMs) for education, work, and even healthcare. Yet these models are known to reproduce and amplify social biases present in their trai...

Carolin Holtermann, Minh Duc Bui, Kaitlyn Zhou, Valentin Hofmann, Katharina von der Wense, Anne L...

2603.22260 • 2026-03-23

View PDF

Papers