Papers
Research papers from arXiv and related sources
Agentic Harness for Real-World Compilers
Compilers are critical to modern computing, yet fixing compiler bugs is difficult. While recent large language model (LLM) advancements enable automated bug repair, compiler bugs pose unique challe...
Yingwei Zheng, Cong Li, Shaohua Li, Yuqun Zhang, Zhendong Su
The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries
When a traveler asks an AI search engine to recommend a hotel, which sources get cited -- and does query framing matter? We audit 1,357 grounding citations from Google Gemini across 156 hotel queri...
Peiying Zhu, Sidi Chang
From School AI Readiness to Student AI Literacy: A National Multilevel Mediation Analysis of Institutional Capacity and Teacher Capability
Artificial intelligence (AI) is increasingly embedded in vocational education systems, yet empirical evidence linking institutional AI readiness to student learning outcomes remains limited. This s...
Xiu Guan, Mingmin Zheng, Dragan Gašević, Wenxin Guo, Yingqun Liu, Xibin Han, Danijela Gasevic, Ru...
Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs
Reinforcement Learning (RL) with rubric-based rewards has recently shown remarkable progress in enhancing general reasoning capabilities of Large Language Models (LLMs), yet still suffers from inef...
Wenjian Zhang, Kongcheng Zhang, Jiaxin Qi, Baisheng Lai, Jianqiang Huang
LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families
Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech recognition (ASR) under high-resource conditions...
Jianan Chen, Xiaoxue Gao, Tatsuya Kawahara, Nancy F. Chen
Orchestrating Human-AI Software Delivery: A Retrospective Longitudinal Field Study of Three Software Modernization Programs
Evidence on AI in software engineering still leans heavily toward individual task completion, while evidence on team-level delivery remains scarce. We report a retrospective longitudinal field stud...
Maximiliano Armesto, Christophe Kolb
Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR
Multimodal large language models (MLLMs) excel at high-level reasoning yet fail on OCR tasks where fine-grained visual details are compromised or misaligned. We identify an overlooked optimization ...
Ziye Yuan, Ruchang Yao, Chengxin Zheng, Yusheng Zhao, Daxiang Dong, Ming Zhang
RouterKGQA: Specialized--General Model Routing for Constraint-Aware Knowledge Graph Question Answering
Knowledge graph question answering (KGQA) is a promising approach for mitigating LLM hallucination by grounding reasoning in structured and verifiable knowledge graphs. Existing approaches fall int...
Bo Yuan, Hexuan Deng, Xuebo Liu, Min Zhang
AgenticRS-EnsNAS: Ensemble-Decoupled Self-Evolving Architecture Search
Neural Architecture Search (NAS) deployment in industrial production systems faces a fundamental validation bottleneck: verifying a single candidate architecture pi requires evaluating the deployed...
Yun Chen, Moyu Zhang, Jinxin Hu, Yu Zhang, Xiaoyi Zeng
ReViSQL: Achieving Human-Level Text-to-SQL
Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have focused on enhancing SQL reasoning by develo...
Yuxuan Zhu, Tengjun Jin, Yoojin Choi, Daniel Kang
An Agentic Approach to Generating XAI-Narratives
Explainable AI (XAI) research has experienced substantial growth in recent years. Existing XAI methods, however, have been criticized for being technical and expert-oriented, motivating the develop...
Yifan He, David Martens
When Contextual Inference Fails: Cancelability in Interactive Instruction Following
We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve underspecified instructions using contextual in...
Natalia Bila, Kata Naszádi, Alexandra Mayn, Christof Monz
Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States
Reinforcement learning (RL) has become a standard paradigm for post-training and aligning Large Language Models (LLMs), yet recent evidence suggests it faces a persistent "capability ceiling": unli...
Yurun Yuan, Tengyang Xie
Promoting Critical Thinking With Domain-Specific Generative AI Provocations
The evidence on the effects of generative AI (GenAI) on critical thinking is mixed, with studies suggesting both potential harms and benefits depending on its implementation. Some argue that AI-dri...
Thomas Şerban von Davier, Hao-Ping Lee, Jodi Forlizzi, Sauvik Das
Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance
Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to active system interaction and environment manag...
Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan, Zhenyu Xu, Xiang Li, Guoxing Chen, Yan Meng, Haoj...
On the Ability of Transformers to Verify Plans
Transformers have shown inconsistent success in AI planning tasks, and theoretical understanding of when generalization should be expected has been limited. We take important steps towards addressi...
Yash Sarrof, Yupei Du, Katharina Stein, Alexander Koller, Sylvie Thiébaux, Michael Hahn
TAPAS: Efficient Two-Server Asymmetric Private Aggregation Beyond Prio(+)
Privacy-preserving aggregation is a cornerstone for AI systems that learn from distributed data without exposing individual records, especially in federated learning and telemetry. Existing two-ser...
Harish Karthikeyan, Antigoni Polychroniadou
Large Language Models and Stock Investing: Is the Human Factor Required?
This paper investigates whether large language models (LLMs) can generate reliable stock market predictions. We evaluate four state-of-the-art models - ChatGPT, Gemini, DeepSeek, and Perplexity - a...
Ricardo Crisostomo, Diana Mykhalyuk
Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents
As large language models (LLMs) evolve into autonomous agents, persistent memory at the API layer is essential for enabling context-aware behavior across LLMs and multi-session interactions. Existi...
Luiz C. Borro, Luiz A. B. Macarini, Gordon Tindall, Michael Montero, Adam B. Struck
SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia
The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large language models (LLMs) offer ...
Zhixiang Lu, Chong Zhang, Yulong Li, Angelos Stefanidis, Anh Nguyen, Imran Razzak, Jionglong Su, ...