Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Surgical Post-Training: Cutting Errors, Keeping Knowledge

Enhancing the reasoning capabilities of Large Language Models (LLMs) via post-training is often constrained by the trade-off between efficiency and catastrophic forgetting. While prior research emp...

Wenye Lin, Kai Han

2603.01683 2026-03-02
AI LLM

HeRo: Adaptive Orchestration of Agentic RAG on Heterogeneous Mobile SoC

With the increasing computational capability of mobile devices, deploying agentic retrieval-augmented generation (RAG) locally on heterogeneous System-on-Chips (SoCs) has become a promising way to ...

Maoliang Li, Jiayu Chen, Zihao Zheng, Ziqian Li, Xinhao Sun, Guojie Luo, Chenchen Liu, Xiang Chen

2603.01661 2026-03-02
AI LLM

CeProAgents: A Hierarchical Agents System for Automated Chemical Process Development

The development of chemical processes, a cornerstone of chemical engineering, presents formidable challenges due to its multi-faceted nature, integrating specialized knowledge, conceptual design, a...

Yuhang Yang, Ruikang Li, Jifei Ma, Kai Zhang, Qi Liu, Jianyu Han, Yonggan Bu, Jibin Zhou, Defu Li...

2603.01654 2026-03-02
AI LLM

LexChronos: An Agentic Framework for Structured Event Timeline Extraction in Indian Jurisprudence

Understanding and predicting judicial outcomes demands nuanced analysis of legal documents. Traditional approaches treat judgments and proceedings as unstructured text, limiting the effectiveness o...

Anka Chandrahas Tummepalli, Preethu Rose Anish

2603.01651 2026-03-02
AI LLM

PromptStereo: Zero-Shot Stereo Matching via Structure and Motion Prompts

Modern stereo matching methods have leveraged monocular depth foundation models to achieve superior zero-shot generalization performance. However, most existing methods primarily focus on extractin...

Xianqi Wang, Hao Yang, Hangtian Wang, Junda Cheng, Gangwei Xu, Min Lin, Xin Yang

2603.01650 2026-03-02
AI LLM

QCAgent: An agentic framework for quality-controllable pathology report generation from whole slide image

Recent methods for pathology report generation from whole-slide image (WSI) are capable of producing slide-level diagnostic descriptions but fail to ground fine-grained statements in localized visu...

Rundong Wang, Wei Ba, Ying Zhou, Yingtai Li, Bowen Liu, Baizhi Wang, Yuhao Wang, Zhidong Yang, Ku...

2603.01647 2026-03-02
AI LLM

Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to generate candidate tokens for a larger target model to verify. The efficacy of this technique h...

Jiebin Zhang, Zhenghan Yu, Liang Wang, Nan Yang, Eugene J. Yu, Zheng Li, Yifan Song, Dawei Zhu, X...

2603.01639 2026-03-02
AI LLM

Who Explains Privacy Policies to Me? Embodied and Textual LLM-Powered Privacy Assistants in Virtual Reality

Virtual Reality (VR) systems collect fine-grained behavioral and biometric data, yet privacy policies are rarely read or understood due to their complex language, length, and poor integration into ...

Vincent Freiberger, Moritz Dresch, Florian Alt, Arthur Fleig, Viktorija Paneva

2603.01638 2026-03-02
AI LLM

DeLo: Dual Decomposed Low-Rank Experts Collaboration for Continual Missing Modality Learning

Adapting Large Multimodal Models (LMMs) to real-world scenarios poses the dual challenges of learning from sequential data streams while handling frequent modality incompleteness, a task known as C...

Xiwei Liu, Yulong Li, Feilong Tang, Imran Razzak

2603.01632 2026-03-02
AI LLM

Assessing Crime Disclosure Patterns in a Large-Scale Cybercrime Forum

Cybercrime forums play a central role in the cybercrime ecosystem, serving as hubs for the exchange of illicit goods, services, and knowledge. Previous studies have explored the market and social s...

Raphael Hoheisel, Tom Meurs, Jai Wientjes, Marianne Junger, Abhishta Abhishta, Masarah Paquet-Clo...

2603.01624 2026-03-02
AI LLM

The Invisibility Hypothesis: Promises of AGI and the Future of the Global South

Discussions surrounding Artificial General Intelligence have largely focused on technical feasibility, timelines, and existential risk, often treating its social impact as being the same across dif...

L. Julian Lechuga Lopez, Luis Lara

2603.01616 2026-03-02
AI LLM

Closing the Gap Between Float and Posit Hardware Efficiency

The b-posit, or bounded posit, is a variation of the posit format designed for high performance computing (HPC) and AI applications. Unlike traditional floating-point formats (floats), posits use v...

Aditya Anirudh Jonnalagadda, Rishi Thotli, John L. Gustafson

2603.01615 2026-03-02
AI LLM

Evaluating and Understanding Scheming Propensity in LLM Agents

As frontier language models are increasingly deployed as autonomous agents pursuing complex, long-term objectives, there is increased risk of scheming: agents covertly pursuing misaligned goals. Pr...

Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, David Lindner

2603.01608 2026-03-02
AI LLM

CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

Large visual language models (VLMs) have shown strong multi-modal medical reasoning ability, but most operate as end-to-end black boxes, diverging from clinicians' evidence-based, staged workflows ...

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu

2603.01607 2026-03-02
AI LLM

MigMate: A VS Code Extension for LLM-based Library Migration of Python Projects

Modern software relies heavily on third-party software libraries to streamline the development process. The act of switching one library for a similar counterpart, called library migration, natural...

Matthias Kebede, May Mahmoud, Mohayeminul Islam, Sarah Nadi

2603.01596 2026-03-02
AI LLM

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

The fast-growing demands in using Large Language Models (LLMs) to tackle complex multi-step data science tasks create an emergent need for accurate benchmarking. There are two major gaps in existin...

Fan Shu, Yite Wang, Ruofan Wu, Boyi Liu, Zhewei Yao, Yuxiong He, Feng Yan

2602.24288 2026-02-27
AI LLM

Do LLMs Benefit From Their Own Words?

Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether lar...

Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas

2602.24287 2026-02-27
AI LLM

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large lang...

Weinan Dai, Hanlin Wu, Qiying Yu, Huan-ang Gao, Jiahao Li, Chengquan Jiang, Weiqiang Lou, Yufan S...

2602.24286 2026-02-27
AI LLM

A Minimal Agent for Automated Theorem Proving

We propose a minimal agentic baseline that enables systematic comparison across different AI-based theorem prover architectures. This design implements the core features shared among state-of-the-a...

Borja Requena Pozo, Austin Letson, Krystian Nowakowski, Izan Beltran Ferreiro, Leopoldo Sarra

2602.24273 2026-02-27
AI LLM

From Efficiency to Meaning: Adolescents' Envisioned Role of AI in Health Management

While prior research has focused on providers, caregivers, and adult patients, little is known about adolescents' perceptions of AI in health learning and management. Utilizing design fiction and c...

Jamie Lee, Kyuha Jung, Cecilia Lee, Lauren MacDonnell, Jessica Kim, Daniel Otterson, Erin Newman,...

2602.24249 2026-02-27