Paper
HeRo: Adaptive Orchestration of Agentic RAG on Heterogeneous Mobile SoC
Authors
Maoliang Li, Jiayu Chen, Zihao Zheng, Ziqian Li, Xinhao Sun, Guojie Luo, Chenchen Liu, Xiang Chen
Abstract
With the increasing computational capability of mobile devices, deploying agentic retrieval-augmented generation (RAG) locally on heterogeneous System-on-Chips (SoCs) has become a promising way to enhance LLM-based applications. However, agentic RAG induces multi-stage workflows with heterogeneous models and dynamic execution flow, while mobile SoCs exhibit strong accelerator affinity, shape sensitivity, and shared-memory bandwidth contention, making naive scheduling ineffective. We present HeRo, a heterogeneous-aware framework for low-latency agentic RAG on mobile SoCs. HeRo builds profiling-based performance models for each sub-stage and model-PU configuration, capturing latency, workload shape, and contention-induced slowdown, and leverages them in a lightweight online scheduler that combines shape-aware sub-stage partitioning, criticality-based accelerator mapping, and bandwidth-aware concurrency control. Experiments on commercial mobile devices show that HeRo reduces end-to-end latency by up to $10.94\times$ over existing deployment strategies, enabling practical on-device agentic RAG.
Metadata
Related papers
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25
Comparing Developer and LLM Biases in Code Evaluation
Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
Biplab Pal, Santanu Bhattacharya • 2026-03-25
Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA
Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.01661v1</id>\n <title>HeRo: Adaptive Orchestration of Agentic RAG on Heterogeneous Mobile SoC</title>\n <updated>2026-03-02T09:51:01Z</updated>\n <link href='https://arxiv.org/abs/2603.01661v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.01661v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>With the increasing computational capability of mobile devices, deploying agentic retrieval-augmented generation (RAG) locally on heterogeneous System-on-Chips (SoCs) has become a promising way to enhance LLM-based applications. However, agentic RAG induces multi-stage workflows with heterogeneous models and dynamic execution flow, while mobile SoCs exhibit strong accelerator affinity, shape sensitivity, and shared-memory bandwidth contention, making naive scheduling ineffective. We present HeRo, a heterogeneous-aware framework for low-latency agentic RAG on mobile SoCs. HeRo builds profiling-based performance models for each sub-stage and model-PU configuration, capturing latency, workload shape, and contention-induced slowdown, and leverages them in a lightweight online scheduler that combines shape-aware sub-stage partitioning, criticality-based accelerator mapping, and bandwidth-aware concurrency control. Experiments on commercial mobile devices show that HeRo reduces end-to-end latency by up to $10.94\\times$ over existing deployment strategies, enabling practical on-device agentic RAG.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.DC'/>\n <published>2026-03-02T09:51:01Z</published>\n <arxiv:comment>Will appear in DAC'2026</arxiv:comment>\n <arxiv:primary_category term='cs.DC'/>\n <author>\n <name>Maoliang Li</name>\n </author>\n <author>\n <name>Jiayu Chen</name>\n </author>\n <author>\n <name>Zihao Zheng</name>\n </author>\n <author>\n <name>Ziqian Li</name>\n </author>\n <author>\n <name>Xinhao Sun</name>\n </author>\n <author>\n <name>Guojie Luo</name>\n </author>\n <author>\n <name>Chenchen Liu</name>\n </author>\n <author>\n <name>Xiang Chen</name>\n </author>\n </entry>"
}