Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

AgenticRS-EnsNAS: Ensemble-Decoupled Self-Evolving Architecture Search

Neural Architecture Search (NAS) deployment in industrial production systems faces a fundamental validation bottleneck: verifying a single candidate architecture pi requires evaluating the deployed...

Yun Chen, Moyu Zhang, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

2603.20014 2026-03-20
AI LLM

ReViSQL: Achieving Human-Level Text-to-SQL

Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have focused on enhancing SQL reasoning by develo...

Yuxuan Zhu, Tengjun Jin, Yoojin Choi, Daniel Kang

2603.20004 2026-03-20
AI LLM

An Agentic Approach to Generating XAI-Narratives

Explainable AI (XAI) research has experienced substantial growth in recent years. Existing XAI methods, however, have been criticized for being technical and expert-oriented, motivating the develop...

Yifan He, David Martens

2603.20003 2026-03-20
TESTING

Sound State Encodings in Translational Separation Logic Verifiers (Extended Version)

Automated program verifiers are often organized into a front-end, which encodes an input program into an intermediate verification language (IVL), and a back-end, which proves that the IVL program ...

Hongyi Ling, Thibault Dardinier, Ellen Arlt, Peter Müller

2603.20001 2026-03-20
AI LLM

When Contextual Inference Fails: Cancelability in Interactive Instruction Following

We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve underspecified instructions using contextual in...

Natalia Bila, Kata Naszádi, Alexandra Mayn, Christof Monz

2603.19997 2026-03-20
TESTING

Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts

Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference wi...

John Turnbull, Shivam Grover, Amin Jalali, Ali Etemad

2603.19994 2026-03-20
AI LLM

Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Reinforcement learning (RL) has become a standard paradigm for post-training and aligning Large Language Models (LLMs), yet recent evidence suggests it faces a persistent "capability ceiling": unli...

Yurun Yuan, Tengyang Xie

2603.19987 2026-03-20
TESTING

Stone-in-Waiting: A Cloud-Based Accelerator for the Quantum Approximate Optimization Algorithm

The Quantum Approximate Optimization Algorithm (QAOA) and its advanced variant, the Quantum Alternating Operator Ansatz (QAOA), are major research topics in the current era of Noisy Intermediate-Sc...

Shuai Zeng

2603.19980 2026-03-20
TESTING

X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

Scalable and reliable evaluation is increasingly critical in the end-to-end era of autonomous driving, where vision--language--action (VLA) policies directly map raw sensor streams to driving actio...

Chaoda Zheng, Sean Li, Jinhao Deng, Zhennan Wang, Shijia Chen, Liqiang Xiao, Ziheng Chi, Hongbin ...

2603.19979 2026-03-20
AI LLM

Promoting Critical Thinking With Domain-Specific Generative AI Provocations

The evidence on the effects of generative AI (GenAI) on critical thinking is mixed, with studies suggesting both potential harms and benefits depending on its implementation. Some argue that AI-dri...

Thomas Şerban von Davier, Hao-Ping Lee, Jodi Forlizzi, Sauvik Das

2603.19975 2026-03-20
AI LLM

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to active system interaction and environment manag...

Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan, Zhenyu Xu, Xiang Li, Guoxing Chen, Yan Meng, Haoj...

2603.19974 2026-03-20
TESTING

Model-Driven Learning-Based Physical Layer Authentication for Mobile Wi-Fi Devices

The rise of wireless technologies has made the Internet of Things (IoT) ubiquitous, but the broadcast nature of wireless communications exposes IoT to authentication risks. Physical layer authentic...

Yijia Guo, Junqing Zhang, Yao-Win Peter Hong, Stefano Tomasin

2603.19972 2026-03-20
TESTING

Interpreting Reinforcement Learning Model Behavior via Koopman with Control

Reinforcement learning (RL) models have shown the capability of learning complex behaviors, but quantitatively assessing those behaviors - which is critical for safety assurance and the discovery o...

William T. Redman

2603.19968 2026-03-20
TESTING

HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing patholo...

Ruicheng Yuan, Zhenxuan Zhang, Anbang Wang, Liwei Hu, Xiangqian Hua, Yaya Peng, Jiawei Luo, Guang...

2603.19957 2026-03-20
AI LLM

On the Ability of Transformers to Verify Plans

Transformers have shown inconsistent success in AI planning tasks, and theoretical understanding of when generalization should be expected has been limited. We take important steps towards addressi...

Yash Sarrof, Yupei Du, Katharina Stein, Alexander Koller, Sylvie Thiébaux, Michael Hahn

2603.19954 2026-03-20
TESTING

On the Capacity of Future Lane-Free Urban Infrastructure

In this paper, the potential capacity and spatial efficiency of future autonomous lane-free traffic in urban environments are explored using a combination of analytical and simulation-based approac...

Patrick Malcolm, Klaus Bogenberger

2603.19952 2026-03-20
AI LLM

TAPAS: Efficient Two-Server Asymmetric Private Aggregation Beyond Prio(+)

Privacy-preserving aggregation is a cornerstone for AI systems that learn from distributed data without exposing individual records, especially in federated learning and telemetry. Existing two-ser...

Harish Karthikeyan, Antigoni Polychroniadou

2603.19949 2026-03-20
AI LLM

Large Language Models and Stock Investing: Is the Human Factor Required?

This paper investigates whether large language models (LLMs) can generate reliable stock market predictions. We evaluate four state-of-the-art models - ChatGPT, Gemini, DeepSeek, and Perplexity - a...

Ricardo Crisostomo, Diana Mykhalyuk

2603.19944 2026-03-20
TESTING

Hybrid topic modelling for computational close reading: Mapping narrative themes in Pushkin's Evgenij Onegin

This study presents a hybrid topic modelling framework for computational literary analysis that integrates Latent Dirichlet Allocation (LDA) with sparse Partial Least Squares Discriminant Analysis ...

Angelo Maria Sabatini

2603.19940 2026-03-20
AI LLM

Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents

As large language models (LLMs) evolve into autonomous agents, persistent memory at the API layer is essential for enabling context-aware behavior across LLMs and multi-session interactions. Existi...

Luiz C. Borro, Luiz A. B. Macarini, Gordon Tindall, Michael Montero, Adam B. Struck

2603.19935 2026-03-20