Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

The Rise of Null Hypothesis Significance Testing (NHST): Institutional Massification and the Emergence of a Procedural Epistemology

It has long been a puzzle why, despite sustained reform efforts, many applied scientific fields remain dominated by Null Hypothesis Significance Testing (NHST), a framework that dichotomizes study ...

Carol Ting

2603.14757 2026-03-16
TESTING

Towards Privacy-Preserving Machine Translation at the Inference Stage: A New Task and Benchmark

Current online translation services require sending user text to cloud servers, posing a risk of privacy leakage when the text contains sensitive information. This risk hinders the application of o...

Wei Shao, Lemao Liu, Yinqiao Li, Guoping Huang, Shuming Shi, Linqi Song

2603.14756 2026-03-16
TESTING

A Skill-augmented Agentic Framework and Benchmark for Multi-Video Understanding

Multimodal Large Language Models have achieved strong performance in single-video understanding, yet their ability to reason across multiple videos remains limited. Existing approaches typically co...

Yue Zhang, Liqiang Jing, Jia Li, Yapeng Tian, Xinya Du, Yunhui Guo, Vibhav Gogate

2603.14733 2026-03-16
TESTING

GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation

Game UI design requires consistent visual assets across rarity tiers yet remains a predominantly manual process. We present GameUIAgent, an LLM-powered agentic framework that translates natural lan...

Wei Zeng, Fengwei An, Zhen Liu, Jian Zhao

2603.14724 2026-03-16
TESTING

Multimodal Deep Learning for Early Prediction of Patient Deterioration in the ICU: Integrating Time-Series EHR Data with Clinical Notes

Early identification of patients at risk for clinical deterioration in the intensive care unit (ICU) remains a critical challenge. Delayed recognition of impending adverse events, including mortali...

Binesh Sadanandan

2603.14719 2026-03-16
TESTING

Quantum-Kinetic Dark Energy (QKDE): An effective dark energy framework with a covariantly completed time-dependent scalar kinetic normalization

A minimal effective dark-energy framework - Quantum-Kinetic Dark Energy (QKDE) - is developed in which the scalar kinetic normalization carries a slow background time dependence through a covariant...

Daniel Brown

2603.14716 2026-03-16
TESTING

Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents

Computer-using agents (CUAs) act directly on graphical user interfaces, yet their perception of the screen is often unreliable. Existing work largely treats these failures as performance limitation...

Xunzhuo Liu, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, Huamin Chen

2603.14707 2026-03-16
TESTING

AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

Frozen-backbone transfer with Vision Transformers faces two under-addressed issues: optimization instability when adapters are naively inserted into a fixed feature extractor, and the absence of pr...

Salim Khazem

2603.14706 2026-03-16
TESTING

Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization

Large language models and AI agents have recently shown promise in automating software performance optimization, but existing approaches predominantly rely on local, syntax-driven code transformati...

Huiyun Peng, Parth Vinod Patil, Antonio Zhong Qiu, George K. Thiruvathukal, James C. Davis

2603.14703 2026-03-16
AI LLM

Representation Learning for Spatiotemporal Physical Systems

Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. Ho...

Helen Qu, Rudy Morel, Michael McCabe, Alberto Bietti, François Lanusse, Shirley Ho, Yann LeCun

2603.13227 2026-03-13
TESTING

Visual-ERM: Reward Modeling for Visual Equivalence

Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent ...

Ziyu Liu, Shengyuan Ding, Xinyu Fang, Xuanlang Dai, Penghui Yang, Jianze Liang, Jiaqi Wang, Kai C...

2603.13224 2026-03-13
AI LLM

A Generative Model of Conspicuous Consumption and Status Signaling

Status signaling drives human behavior and the allocation of scarce resources such as mating opportunities, yet the generative mechanisms governing how specific goods, signals, or behaviors acquire...

Logan Cross, Jordi Grau-Moya, William A. Cunningham, Alexander Sasha Vezhnevets, Joel Z. Leibo

2603.13220 2026-03-13
TESTING

Bounds on Agreement between Subjective and Objective Measurements

Objective estimators of multimedia quality are often judged by comparing estimates with subjective "truth data," most often via Pearson correlation coefficient (PCC) or mean-squared error (MSE). Bu...

Jaden Pieper, Stephen D. Voran

2603.13204 2026-03-13
AI LLM

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models

Instruction Tuning (IT) has been proven to be an effective approach to unlock the powerful capabilities of large language models (LLMs). Recent studies indicate that excessive IT data can degrade L...

Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Min Yang, Shujian Huang, Lidia S. Chao, Der...

2603.13201 2026-03-13
AI LLM

Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present ...

Mathias N. Lystbæk, Haley Adams, Ranjith Kagathi Ananda, Eric J Gonzalez, Luca Ballan, Qiuxuan Wu...

2603.13200 2026-03-13
AI LLM

From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven Computational Research

While large language models (LLMs) have transformed AI agents into proficient executors of computational materials science, performing a hundred simulations does not make a researcher. What disting...

Haonan Huang

2603.13191 2026-03-13
TESTING

Lattice Discrete Particle Model (LDPM): Comparison of Various Time Integration Solvers and Implementations

This article presents a comparison of various implementations of the Lattice Discrete Particle Model (LDPM) for the numerical simulation of concrete and other heterogeneous quasibrittle materials. ...

Erol Lale, Jan Eliáš, Ke Yu, Matthew Troemner, Monika Středulová, Julien Khoury, Tianju Xue, Ioan...

2603.13190 2026-03-13
AI LLM

LLM Constitutional Multi-Agent Governance

Large Language Models (LLMs) can generate persuasive influence strategies that shift cooperative behavior in multi-agent populations, but a critical question remains: does the resulting cooperation...

J. de Curtò, I. de Zarzà

2603.13189 2026-03-13
TESTING

Verification of Robust Properties for Access Control Policies

Existing methods for verifying access control policies require the policy to be complete and fully determined before verification can proceed, but in practice policies are developed iteratively, co...

Alexander V. Gheorghiu

2603.13181 2026-03-13
AI LLM

Semantic Invariance in Agentic AI

Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordination systems. However, deploying LLM agents i...

I. de Zarzà, J. de Curtò, Jordi Cabot, Pietro Manzoni, Carlos T. Calafate

2603.13173 2026-03-13