Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Agentic Harness for Real-World Compilers

Compilers are critical to modern computing, yet fixing compiler bugs is difficult. While recent large language model (LLM) advancements enable automated bug repair, compiler bugs pose unique challe...

Yingwei Zheng, Cong Li, Shaohua Li, Yuqun Zhang, Zhendong Su

2603.20075 2026-03-20
AI LLM

The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries

When a traveler asks an AI search engine to recommend a hotel, which sources get cited -- and does query framing matter? We audit 1,357 grounding citations from Google Gemini across 156 hotel queri...

Peiying Zhu, Sidi Chang

2603.20062 2026-03-20
AI LLM

From School AI Readiness to Student AI Literacy: A National Multilevel Mediation Analysis of Institutional Capacity and Teacher Capability

Artificial intelligence (AI) is increasingly embedded in vocational education systems, yet empirical evidence linking institutional AI readiness to student learning outcomes remains limited. This s...

Xiu Guan, Mingmin Zheng, Dragan Gašević, Wenxin Guo, Yingqun Liu, Xibin Han, Danijela Gasevic, Ru...

2603.20056 2026-03-20
AI LLM

Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

Reinforcement Learning (RL) with rubric-based rewards has recently shown remarkable progress in enhancing general reasoning capabilities of Large Language Models (LLMs), yet still suffers from inef...

Wenjian Zhang, Kongcheng Zhang, Jiaxin Qi, Baisheng Lai, Jianqiang Huang

2603.20046 2026-03-20
AI LLM

LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families

Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech recognition (ASR) under high-resource conditions...

Jianan Chen, Xiaoxue Gao, Tatsuya Kawahara, Nancy F. Chen

2603.20042 2026-03-20
AI LLM

Orchestrating Human-AI Software Delivery: A Retrospective Longitudinal Field Study of Three Software Modernization Programs

Evidence on AI in software engineering still leans heavily toward individual task completion, while evidence on team-level delivery remains scarce. We report a retrospective longitudinal field stud...

Maximiliano Armesto, Christophe Kolb

2603.20028 2026-03-20
AI LLM

Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR

Multimodal large language models (MLLMs) excel at high-level reasoning yet fail on OCR tasks where fine-grained visual details are compromised or misaligned. We identify an overlooked optimization ...

Ziye Yuan, Ruchang Yao, Chengxin Zheng, Yusheng Zhao, Daxiang Dong, Ming Zhang

2603.20020 2026-03-20
AI LLM

RouterKGQA: Specialized--General Model Routing for Constraint-Aware Knowledge Graph Question Answering

Knowledge graph question answering (KGQA) is a promising approach for mitigating LLM hallucination by grounding reasoning in structured and verifiable knowledge graphs. Existing approaches fall int...

Bo Yuan, Hexuan Deng, Xuebo Liu, Min Zhang

2603.20017 2026-03-20
AI LLM

AgenticRS-EnsNAS: Ensemble-Decoupled Self-Evolving Architecture Search

Neural Architecture Search (NAS) deployment in industrial production systems faces a fundamental validation bottleneck: verifying a single candidate architecture pi requires evaluating the deployed...

Yun Chen, Moyu Zhang, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

2603.20014 2026-03-20
AI LLM

ReViSQL: Achieving Human-Level Text-to-SQL

Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have focused on enhancing SQL reasoning by develo...

Yuxuan Zhu, Tengjun Jin, Yoojin Choi, Daniel Kang

2603.20004 2026-03-20
AI LLM

An Agentic Approach to Generating XAI-Narratives

Explainable AI (XAI) research has experienced substantial growth in recent years. Existing XAI methods, however, have been criticized for being technical and expert-oriented, motivating the develop...

Yifan He, David Martens

2603.20003 2026-03-20
AI LLM

When Contextual Inference Fails: Cancelability in Interactive Instruction Following

We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve underspecified instructions using contextual in...

Natalia Bila, Kata Naszádi, Alexandra Mayn, Christof Monz

2603.19997 2026-03-20
AI LLM

Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Reinforcement learning (RL) has become a standard paradigm for post-training and aligning Large Language Models (LLMs), yet recent evidence suggests it faces a persistent "capability ceiling": unli...

Yurun Yuan, Tengyang Xie

2603.19987 2026-03-20
AI LLM

Promoting Critical Thinking With Domain-Specific Generative AI Provocations

The evidence on the effects of generative AI (GenAI) on critical thinking is mixed, with studies suggesting both potential harms and benefits depending on its implementation. Some argue that AI-dri...

Thomas Şerban von Davier, Hao-Ping Lee, Jodi Forlizzi, Sauvik Das

2603.19975 2026-03-20
AI LLM

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to active system interaction and environment manag...

Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan, Zhenyu Xu, Xiang Li, Guoxing Chen, Yan Meng, Haoj...

2603.19974 2026-03-20
AI LLM

On the Ability of Transformers to Verify Plans

Transformers have shown inconsistent success in AI planning tasks, and theoretical understanding of when generalization should be expected has been limited. We take important steps towards addressi...

Yash Sarrof, Yupei Du, Katharina Stein, Alexander Koller, Sylvie Thiébaux, Michael Hahn

2603.19954 2026-03-20
AI LLM

TAPAS: Efficient Two-Server Asymmetric Private Aggregation Beyond Prio(+)

Privacy-preserving aggregation is a cornerstone for AI systems that learn from distributed data without exposing individual records, especially in federated learning and telemetry. Existing two-ser...

Harish Karthikeyan, Antigoni Polychroniadou

2603.19949 2026-03-20
AI LLM

Large Language Models and Stock Investing: Is the Human Factor Required?

This paper investigates whether large language models (LLMs) can generate reliable stock market predictions. We evaluate four state-of-the-art models - ChatGPT, Gemini, DeepSeek, and Perplexity - a...

Ricardo Crisostomo, Diana Mykhalyuk

2603.19944 2026-03-20
AI LLM

Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents

As large language models (LLMs) evolve into autonomous agents, persistent memory at the API layer is essential for enabling context-aware behavior across LLMs and multi-session interactions. Existi...

Luiz C. Borro, Luiz A. B. Macarini, Gordon Tindall, Michael Montero, Adam B. Struck

2603.19935 2026-03-20
AI LLM

SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia

The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large language models (LLMs) offer ...

Zhixiang Lu, Chong Zhang, Yulong Li, Angelos Stefanidis, Anh Nguyen, Imran Razzak, Jionglong Su, ...

2603.19931 2026-03-20