Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

When Drafts Evolve: Speculative Decoding Meets Online Learning

Speculative decoding has emerged as a widely adopted paradigm for accelerating large language model inference, where a lightweight draft model rapidly generates candidate tokens that are then verif...

Yu-Yang Qian, Hao-Cong Wu, Yichao Fu, Hao Zhang, Peng Zhao

2603.12617 2026-03-13
AI LLM

Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior

Existing AI moral evaluation frameworks test for the production of correct-sounding ethical responses rather than the presence of genuine moral reasoning capacity. This paper introduces a novel pro...

David C. Flynn

2603.12615 2026-03-13
AI LLM

ChainFuzzer: Greybox Fuzzing for Workflow-Level Multi-Tool Vulnerabilities in LLM Agents

Tool-augmented LLM agents increasingly rely on multi-step, multi-tool workflows to complete real tasks. This design expands the attack surface, because data produced by one tool can be persisted an...

Jiangrong Wu, Zitong Yao, Yuhong Nan, Zibin Zheng

2603.12614 2026-03-13
AI LLM

InterDeepResearch: Enabling Human-Agent Collaborative Information Seeking through Interactive Deep Research

Deep research systems powered by LLM agents have transformed complex information seeking by automating the iterative retrieval, filtering, and synthesis of insights from massive-scale web sources. ...

Bo Pan, Lunke Pan, Yitao Zhou, Qi Jiang, Zhen Wen, Minfeng Zhu, Wei Chen

2603.12608 2026-03-13
AI LLM

A2Z-10M+: Geometric Deep Learning with A-to-Z BRep Annotations for AI-Assisted CAD Modeling and Reverse Engineering

Reverse engineering and rapid prototyping of computer-aided design (CAD) models from 3D scans, sketches, or simple text prompts are vital in industrial product design. However, recent advances in g...

Pritham Kumar Jena, Bhavika Baburaj, Tushar Anand, Vedant Dutta, Vineeth Ulavala, Sk Aziz Ali

2603.12605 2026-03-13
AI LLM

How GenAI Mentor Configurations Shape Early Collaborative Dynamics: A Classroom Comparison of Individual and Shared Agents

Generative artificial intelligence (GenAI) is increasingly embedded in computer-supported collaborative learning (CSCL), yet little empirical research has unpacked how different configurations of A...

Siyu Zha, Weijing Liu, Fei Qin, Jie Cao, Yanjin Wang, Yujia Liu, Kaiyi Zhang, Jiangtao Gong, Ying...

2603.12600 2026-03-13
AI LLM

Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs

Visual design is an essential application of state-of-the-art multi-modal AI systems. Improving these systems requires high-quality vision-language data at scale. Despite the abundance of internet ...

Zixin Wen, Yifu Cai, Kyle Lee, Sam Estep, Josh Sunshine, Aarti Singh, Yuejie Chi, Wode Ni

2603.12597 2026-03-13
AI LLM

Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback (RLHF) is a widely used approach to align large-scale AI systems with human values. However, RLHF typically assumes a single, universal reward, which over...

Gihoon Kim, Euntai Kim

2603.12595 2026-03-13
TESTING

Early Pruning for Public Transport Routing

Routing algorithms for public transport, particularly the widely used RAPTOR and its variants, often face performance bottlenecks during the transfer relaxation phase, especially on dense transfer ...

Andrii Rohovyi, Abdallah Abuaisha, Toby Walsh

2603.12592 2026-03-13
AI LLM

Expert Pyramid Tuning: Efficient Parameter Fine-Tuning for Expertise-Driven Task Allocation

Parameter-Efficient Fine-Tuning (PEFT) has become a dominant paradigm for deploying LLMs in multi-task scenarios due to its extreme parameter efficiency. While Mixture-of-Experts (MoE) based LoRA v...

Jia-Chen Zhang, Zhen-Wei Yan, Yu-Jie Xiong, Chun-Ming Xia

2603.12577 2026-03-13
AI LLM

From Woofs to Words: Towards Intelligent Robotic Guide Dogs with Verbal Communication

Assistive robotics is an important subarea of robotics that focuses on the well-being of people with disabilities. A robotic guide dog is an assistive quadruped robot that helps visually impaired p...

Yohei Hayamizu, David DeFazio, Hrudayangam Mehta, Zainab Altaweel, Jacqueline Choe, Chao Lin, Jak...

2603.12574 2026-03-13
TESTING

Pointwise mutual information bounded by stochastic Fisher information

We derive general upper bounds to pointwise mutual information in terms of stochastic Fisher information and show these bounds average to known results in the literature for bounds to mutual inform...

Pedro B. Melo

2603.12573 2026-03-13
AI LLM

LMEB: Long-horizon Memory Embedding Benchmark

Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage...

Xinping Zhao, Xinshuo Hu, Jiaxin Xu, Danyu Tang, Xin Zhang, Mengjia Zhou, Yan Zhong, Yao Zhou, Zi...

2603.12572 2026-03-13
TESTING

Hot Jupiter - Cold Jupiter: A complex sibling relation

A handful of planetary systems hosting a Hot Jupiter have been subsequently found to also host long-period giant planets. These ``cold Jupiters,'' giant planets residing beyond the snow line ($\sim...

Adriana Errico, Robert A. Wittenmyer, Jonathan Horner, Brad Carter, Valeria López

2603.12568 2026-03-13
AI LLM

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization

SpeechLLMs typically combine ASR-trained encoders with text-based LLM backbones, leading them to inherit written-style output patterns unsuitable for text-to-speech synthesis. This mismatch is part...

Mengjie Zhao, Lianbo Liu, Yusuke Fujita, Hao Shi, Yuan Gao, Roman Koshkin, Yui Sudo

2603.12565 2026-03-13
AI LLM

AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents

Tool-augmented LLM agents increasingly serve as multi-turn advisors in high-stakes domains, yet their evaluation relies on ranking-quality metrics that measure what is recommended but not whether i...

Zekun Wu, Adriano Koshiyama, Sahan Bulathwela, Maria Perez-Ortiz

2603.12564 2026-03-13
TESTING

Consistent and powerful CUSUM change-point test for panel data with changes in variance

This paper investigates change-point of variance in panel data models with time series of $α$-mixing. Based on the cumulative sum (CUSUM) method and the individual differences, we construct a CUSUM...

Wenzhi Yang, Yueting Xu, Xiaoping Shi, Qiong Li

2603.12561 2026-03-13
AI LLM

Large Language Models as Delivery Rider: Generating Instant Food Delivery Riders' Routing Decision with LLM Agent Framework

The utilization of Large Language Models (LLMs) to power human-like agents has shown remarkable potential in simulating individual mobility pattern. However, a significant gap remains in modeling c...

Chengbo Zhang, Zuopeng Xiao

2603.12559 2026-03-13
AI LLM

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

Reinforcement learning (RL) has been effective for post-training autoregressive (AR) language models, but extending these methods to diffusion language models (DLMs) is challenging due to intractab...

Vishnu Teja Kunde, Fatemeh Doudi, Mahdi Farahbakhsh, Dileep Kalathil, Krishna Narayanan, Jean-Fra...

2603.12554 2026-03-13
AI LLM

Embedded Quantum Machine Learning in Embedded Systems: Feasibility, Hybrid Architectures, and Quantum Co-Processors

Embedded quantum machine learning (EQML) seeks to bring quantum machine learning (QML) capabilities to resource-constrained edge platforms such as IoT nodes, wearables, drones, and cyber-physical c...

Somdip Dey, Syed Muhammad Raza

2603.12540 2026-03-13