Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression

Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving. However, existing approaches often result i...

Ruibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu

2603.17435 2026-03-18
AI LLM

Argument Reconstruction as Supervision for Critical Thinking in LLMs

To think critically about arguments, human learners are trained to identify, reconstruct, and evaluate arguments. Argument reconstruction is especially important because it makes an argument's unde...

Hyun Ryu, Gyouk Chu, Gregor Betz, Eunho Yang, Carolyn Rose, Sean Welleck

2603.17432 2026-03-18
AI LLM

From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence

The rapid evolution toward 6G and beyond communication systems is accelerating the convergence of digital twins and world models at the network edge. Traditional digital twins provide high-fidelity...

Jie Zheng, Dusit Niyato, Changyuan Zhao, Jiawen Kang, Jiacheng Wang

2603.17420 2026-03-18
AI LLM

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communicati...

Saikat Maiti

2603.17419 2026-03-18
AI LLM

PowerDAG: Reliable Agentic AI System for Automating Distribution Grid Analysis

This paper introduces PowerDAG, an agentic AI system for automating complex distribution-grid analysis. We address the reliability challenges of state-of-the-art agentic systems in automating compl...

Emmanuel O. Badmus, Amritanshu Pandey

2603.17418 2026-03-18
AI LLM

Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

Current Large Language Models (LLMs) are gradually exploited in practically valuable agentic workflows such as Deep Research, E-commerce recommendation, and job recruitment. In these applications, ...

Zichen Tang, Zirui Zhang, Qian Wang, Zhenheng Tang, Bo Li, Xiaowen Chu

2603.17417 2026-03-18
AI LLM

Bootstrapping Coding Agents: The Specification Is the Program

A coding agent can bootstrap itself. Starting from a 926-word specification and a first implementation produced by an existing agent (Claude Code), a newly generated agent re-implements the same sp...

Martin Monperrus

2603.17399 2026-03-18
AI LLM

Agentic Cognitive Profiling: Realigning Automated Alzheimer's Disease Detection with Clinical Construct Validity

Automated Alzheimer's Disease (AD) screening has predominantly followed the inductive paradigm of pattern recognition, which directly maps the input signal to the outcome label. This paradigm sacri...

Jiawen Kang, Kun Li, Dongrui Han, Jinchao Li, Junan Li, Lingwei Meng, Xixin Wu, Helen Meng

2603.17392 2026-03-18
AI LLM

Efficient Reasoning on the Edge

Large language models (LLMs) with chain-of-thought reasoning achieve state-of-the-art performance across complex problem-solving tasks, but their verbose reasoning traces and large context requirem...

Yelysei Bondarenko, Thomas Hehn, Rob Hesselink, Romain Lepert, Fabio Valerio Massoli, Evgeny Miro...

2603.16867 2026-03-17
AI LLM

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems strugg...

Sahil Sen, Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah

2603.16862 2026-03-17
AI LLM

Mediocrity is the key for LLM as a Judge Anchor Selection

The ``LLM-as-a-judge'' paradigm has become a standard method for evaluating open-ended generation. To address the quadratic scalability costs of pairwise comparisons, popular benchmarks like Arena-...

Shachar Don-Yehiya, Asaf Yehudai, Leshem Choshen, Omri Abend

2603.16848 2026-03-17
AI LLM

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

Automated presentation generation remains a challenging task requiring coherent content creation, visual design, and audience-aware communication. This work proposes an OpenEnv-compatible reinforce...

Karthik Ragunath Ananda Kumar, Subrahmanyam Arunachalam

2603.16839 2026-03-17
AI LLM

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are in...

Maksim Eren, Eric Michalak, Brian Cook, Johnny Seales

2603.16827 2026-03-17
AI LLM

Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

Surgical intelligence has the potential to improve the safety and consistency of surgical care, yet most existing surgical AI frameworks remain task-specific and struggle to generalize across proce...

Zhitao Zeng, Mengya Xu, Jian Jiang, Pengfei Guo, Yunqiu Xu, Zhu Zhuo, Chang Han Low, Yufan He, Do...

2603.16822 2026-03-17
AI LLM

Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)

Incident management is essential to maintain the reliability and availability of cloud computing services. Cloud vendors typically disclose incident reports to the public, summarizing the failures ...

Xiaoyu Chu, Shashikant Ilager, Yizhen Zang, Sacheendra Talluri, Alexandru Iosup

2603.16818 2026-03-17
AI LLM

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights

Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensive applications. Retrieval-augmented generation (RAG) and conformal factuality have emerged as po...

Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar, Caitlyn Heqi Yin, Ramya Korlakai Vinayak

2603.16817 2026-03-17
AI LLM

ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation

Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. As systems evolve towa...

Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern...

2603.16812 2026-03-17
AI LLM

Improving Code Comprehension through Cognitive-Load Aware Automated Refactoring for Novice Programmers

Novice programmers often struggle to comprehend code due to vague naming, deep nesting, and poor structural organization. While explanations may offer partial support, they typically do not restruc...

Subarna Saha, Alif Al Hasan, Fariha Tanjim Shifat, Mia Mohammad Imran

2603.16791 2026-03-17
AI LLM

IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communicat...

Huimin Xiong, Zijie Meng, Tianxiang Hu, Chenyi Zhou, Yang Feng, Zuozhu Liu

2603.16781 2026-03-17
AI LLM

Anticipatory Planning for Multimodal AI Agents

Recent advances in multimodal agents have improved computer-use interaction and tool-usage, yet most existing systems remain reactive, optimizing actions in isolation without reasoning about future...

Yongyuan Liang, Shijie Zhou, Yu Gu, Hao Tan, Gang Wu, Franck Dernoncourt, Jihyung Kil, Ryan A. Ro...

2603.16777 2026-03-17