Papers
Research papers from arXiv and related sources
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving. However, existing approaches often result i...
Ruibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu
Argument Reconstruction as Supervision for Critical Thinking in LLMs
To think critically about arguments, human learners are trained to identify, reconstruct, and evaluate arguments. Argument reconstruction is especially important because it makes an argument's unde...
Hyun Ryu, Gyouk Chu, Gregor Betz, Eunho Yang, Carolyn Rose, Sean Welleck
From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence
The rapid evolution toward 6G and beyond communication systems is accelerating the convergence of digital twins and world models at the network edge. Traditional digital twins provide high-fidelity...
Jie Zheng, Dusit Niyato, Changyuan Zhao, Jiawen Kang, Jiacheng Wang
Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare
Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communicati...
Saikat Maiti
PowerDAG: Reliable Agentic AI System for Automating Distribution Grid Analysis
This paper introduces PowerDAG, an agentic AI system for automating complex distribution-grid analysis. We address the reliability challenges of state-of-the-art agentic systems in automating compl...
Emmanuel O. Badmus, Amritanshu Pandey
Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)
Current Large Language Models (LLMs) are gradually exploited in practically valuable agentic workflows such as Deep Research, E-commerce recommendation, and job recruitment. In these applications, ...
Zichen Tang, Zirui Zhang, Qian Wang, Zhenheng Tang, Bo Li, Xiaowen Chu
Bootstrapping Coding Agents: The Specification Is the Program
A coding agent can bootstrap itself. Starting from a 926-word specification and a first implementation produced by an existing agent (Claude Code), a newly generated agent re-implements the same sp...
Martin Monperrus
Agentic Cognitive Profiling: Realigning Automated Alzheimer's Disease Detection with Clinical Construct Validity
Automated Alzheimer's Disease (AD) screening has predominantly followed the inductive paradigm of pattern recognition, which directly maps the input signal to the outcome label. This paradigm sacri...
Jiawen Kang, Kun Li, Dongrui Han, Jinchao Li, Junan Li, Lingwei Meng, Xixin Wu, Helen Meng
Efficient Reasoning on the Edge
Large language models (LLMs) with chain-of-thought reasoning achieve state-of-the-art performance across complex problem-solving tasks, but their verbose reasoning traces and large context requirem...
Yelysei Bondarenko, Thomas Hehn, Rob Hesselink, Romain Lepert, Fabio Valerio Massoli, Evgeny Miro...
Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory
Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems strugg...
Sahil Sen, Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah
Mediocrity is the key for LLM as a Judge Anchor Selection
The ``LLM-as-a-judge'' paradigm has become a standard method for evaluating open-ended generation. To address the quadratic scalability costs of pairwise comparisons, popular benchmarks like Arena-...
Shachar Don-Yehiya, Asaf Yehudai, Leshem Choshen, Omri Abend
Learning to Present: Inverse Specification Rewards for Agentic Slide Generation
Automated presentation generation remains a challenging task requiring coherent content creation, visual design, and audience-aware communication. This work proposes an OpenEnv-compatible reinforce...
Karthik Ragunath Ananda Kumar, Subrahmanyam Arunachalam
Prompt Programming for Cultural Bias and Alignment of Large Language Models
Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are in...
Maksim Eren, Eric Michalak, Brian Cook, Johnny Seales
Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence
Surgical intelligence has the potential to improve the safety and consistency of surgical care, yet most existing surgical AI frameworks remain task-specific and struggle to generalize across proce...
Zhitao Zeng, Mengya Xu, Jian Jiang, Pengfei Guo, Yunqiu Xu, Zhu Zhuo, Chang Han Low, Yufan He, Do...
Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)
Incident management is essential to maintain the reliability and availability of cloud computing services. Cloud vendors typically disclose incident reports to the public, summarizing the failures ...
Xiaoyu Chu, Shashikant Ilager, Yizhen Zang, Sacheendra Talluri, Alexandru Iosup
Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights
Large language models (LLMs) frequently hallucinate, limiting their reliability in knowledge-intensive applications. Retrieval-augmented generation (RAG) and conformal factuality have emerged as po...
Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar, Caitlyn Heqi Yin, Ramya Korlakai Vinayak
ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation
Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. As systems evolve towa...
Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern...
Improving Code Comprehension through Cognitive-Load Aware Automated Refactoring for Novice Programmers
Novice programmers often struggle to comprehend code due to vague naming, deep nesting, and poor structural organization. While explanations may offer partial support, they typically do not restruc...
Subarna Saha, Alif Al Hasan, Fariha Tanjim Shifat, Mia Mohammad Imran
IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans
3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communicat...
Huimin Xiong, Zijie Meng, Tianxiang Hu, Chenyi Zhou, Yang Feng, Zuozhu Liu
Anticipatory Planning for Multimodal AI Agents
Recent advances in multimodal agents have improved computer-use interaction and tool-usage, yet most existing systems remain reactive, optimizing actions in isolation without reasoning about future...
Yongyuan Liang, Shijie Zhou, Yu Gu, Hao Tan, Gang Wu, Franck Dernoncourt, Jihyung Kil, Ryan A. Ro...