Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
TESTING

Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation

Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or fixed temperature/top-p decoding are static and often task-agno...

Asmita Bhardwaj, Yuya Jeremy Ong, Eelaaf Zahid, Basel Shbita

2603.18428 2026-03-19
TESTING

The Impact of Corporate AI Washing on Farmers' Digital Financial Behavior Response -- An Analysis from the Perspective of Digital Financial Exclusion

In the context of the rapid development of digital finance, some financial technology companies exhibit the phenomenon of "AI washing," where they overstate their AI capabilities while underinvesti...

Li Wenxiu, Wen Zhanjie, Xia Jiechang, Guo Jingqiao

2603.18421 2026-03-19
TESTING

The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

At a time when the phenomenon of 'AI washing' is quietly spreading, an increasing number of enterprises are using the label of artificial intelligence merely as a cosmetic embellishment in their an...

Li Wenxiu, Wen Zhanjie, Xia Jiechang, Guo Jingqiao

2603.18415 2026-03-19
TESTING

Statistical Testing Framework for Clustering Pipelines by Selective Inference

A data analysis pipeline is a structured sequence of steps that transforms raw data into meaningful insights by integrating multiple analysis algorithms.In many practical applications, analytical f...

Yugo Miyata, Tomohiro Shiraishi, Shunichi Nishino, Ichiro Takeuchi

2603.18413 2026-03-19
TESTING

TARo: Token-level Adaptive Routing for LLM Test-time Alignment

Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent test-time alignment methods offer a lightweight al...

Arushi Rai, Qiang Zhang, Hanqing Zeng, Yunkai Zhang, Dipesh Tamboli, Xiangjun Fan, Zhuokai Zhao

2603.18411 2026-03-19
TESTING

Where are the Hidden Gems? Applying Transformer Models for Design Discussion Detection

Design decisions are at the core of software engineering and appear in Q\&A forums, mailing lists, pull requests, issue trackers, and commit messages. Design discussions spanning a project's histor...

Lawrence Arkoh, Daniel Feitosa, Wesley K. G. Assunção

2603.18393 2026-03-19
TESTING

Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization

Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompt...

Shiyan Liu, Qifeng Xia, Qiyun Xia, Yisheng Liu, Xinyu Yu, Rui Qu

2603.18388 2026-03-19
TESTING

A Non-parametric Method for the Inference of Halo Occupation Distributions

The galaxy-halo connection traces processes by which galaxies form and evolve. The halo occupation distribution (HOD) describes the relationship between galaxies and their host dark matter haloes. ...

Jacob Kennedy, Eric Gawiser, Kartheik G. Iyer, L. Y. Aaron Yung

2603.18379 2026-03-19
TESTING

Lensing in the Blue III: Weak Lensing Shape Catalogs of 30 Merging Galaxy Clusters

We present the weak gravitational lensing dataset from the Super-pressure Balloon-Borne Imaging Telescope (SuperBIT), which imaged 30 galaxy clusters during its 45 night flight in April to May 2023...

Sayan Saha, Jacqueline E. McCleary, Spencer W. Everett, Maya Amit, Georgios N. Vassilakis, Emaad ...

2603.18376 2026-03-19
TESTING

TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report)

Sparse Tensor Compilers (STCs) have emerged as critical infrastructure for optimizing high-dimensional data analytics and machine learning workloads. The STCs must synthesize complex, irregular con...

Kabilan Mahathevan, Yining Zhang, Muhammad Ali Gulzar, Kirshanthan Sundararajah

2603.18372 2026-03-19
TESTING

Contact Status Recognition and Slip Detection with a Bio-inspired Tactile Hand

Stable and reliable grasp is critical to robotic manipulations especially for fragile and glazed objects, where the grasp force requires precise control as too large force possibly damages the obje...

Chengxiao He, Wenhui Yang, Hongliang Zhao, Jiacheng Lv, Yuzhe Shao, Longhui Qin

2603.18370 2026-03-19
TESTING

Multi-material Direct Ink Writing and Embroidery for Stretchable Wearable Sensors

The development of wearable sensing systems for sports performance tracking, rehabilitation, and injury prevention has driven growing demand for smart garments that combine comfort, durability, and...

Lukas Cha, Ryman Hashem, Ria Prakash, Tanguy Declety, Wenze Zhang, Liang He

2603.18354 2026-03-18
TESTING

Interpretability without actionability: mechanistic methods cannot correct language model errors despite near-perfect internal representations

Language models encode task-relevant knowledge in internal representations that far exceeds their output performance, but whether mechanistic interpretability methods can bridge this knowledge-acti...

Sanjay Basu, Sadiq Y. Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra,...

2603.18353 2026-03-18
TESTING

PeriphAR: Fast and Accurate Real-World Object Selection with Peripheral Augmented Reality Displays

Gaze-based selection in XR requires visual confirmation due to eye-tracking limitations and target ambiguity in 3D contexts. Current designs for wide-FOV displays use world-locked, central overlays...

Yutong Ren, Arnav Reddy, Michael Nebeling

2603.18350 2026-03-18
TESTING

Synthetic Data, Information, and Prior Knowledge: Why Synthetic Data Augmentation to Boost Sample Doesn't Work for Statistical Inference

The use of synthetic data to deidentify data and to improve predictive models is well-attested to. The augmentation of datasets using synthetically generated data is an alluring proposition: in the...

Reid Dale, Jordan Rodu, Mike Baiocchi

2603.18345 2026-03-18
TESTING

VISTA: Validation-Guided Integration of Spatial and Temporal Foundation Models with Anatomical Decoding for Rare-Pathology VCE Event Detection

Capsule endoscopy event detection is challenging because diagnostically relevant findings are sparse, visually heterogeneous, and embedded in long, noisy video streams, while evaluation is performe...

Bo-Cheng Qiu, Yu-Fan Lin, Yu-Zhe Pien, Chia-Ming Lee, Fu-En Yang, Yu-Chiang Frank Wang, Chih-Chun...

2603.18343 2026-03-18
TESTING

Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought

As Large Language Models (LLMs) increasingly assist secure software development, their ability to meet the rigorous demands of Rust program verification remains unclear. Existing evaluations treat ...

Zichen Xie, Wenxi Wang

2603.18334 2026-03-18
TESTING

FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering

Inference-time steering is widely regarded as a lightweight and parameter-free mechanism for controlling large language model (LLM) behavior, and prior work has often suggested that simple activati...

Zikang Ding, Qiying Hu, Yi Zhang, Hongji Li, Junchi Yao, Hongbo Liu, Lijie Hu

2603.18329 2026-03-18
TESTING

Sub-Yield Dynamics in Yield-Stress Materials

The mechanical response of yield-stress materials below the yield point remains a subject of debate. Two of the most widely used constitutive models for these materials offer fundamentally conflict...

Alice Woodbridge, Kasra Amini, Fredrik Lundell, Outi Tammisola, Anne Juel, Robert J. Poole, Cláud...

2603.18302 2026-03-18
AI LLM

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

Token pruning is essential for enhancing the computational efficiency of vision-language models (VLMs), particularly for video-based tasks where temporal redundancy is prevalent. Prior approaches t...

Jianrui Zhang, Yue Yang, Rohun Tripathi, Winson Han, Ranjay Krishna, Christopher Clark, Yong Jae ...

2603.18004 2026-03-18