Paper
DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming
Authors
Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao
Abstract
We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent window architecture for independent rendering of danmaku (scrolling text), gift and like particle effects, and VIP entrance animations, built around an event-driven WebView2 capture pipeline and a thread-safe event bus. On top of this foundation we contribute an LLM broadcast automation framework comprising: (1) a per-song four-segment prompt scheduling system (T1 opening/transition, T2 empathy, T3 era story/production notes, T4 closing) that generates emotionally coherent radio-style commentary from lyric metadata; (2) a JSON-serializable RadioPersonaConfig schema supporting hot-swap multi-persona broadcasting; (3) a real-time danmaku quick-reaction engine with keyword routing to static urgent speech or LLM-generated empathetic responses; and (4) the Suwan Li AI singer-songwriter persona case study -- over 100 AI-generated songs produced with Suno. A 36-hour stress test demonstrates: zero danmaku overlap, zero deadlock crashes, gift effect P95 latency <= 180 ms, LLM-to-TTS segment P95 latency <= 2.1 s, and TTS integrated loudness gain of 9.5 LUFS. live streaming; danmaku; large language model; prompt engineering; virtual persona; WebView2; WINMM; TTS; Suno; loudness normalization; real-time scheduling
Metadata
Related papers
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30
Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books
Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30
ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining
Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30
RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems
Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.03060v1</id>\n <title>DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming</title>\n <updated>2026-03-03T14:56:23Z</updated>\n <link href='https://arxiv.org/abs/2603.03060v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.03060v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent window architecture for independent rendering of danmaku (scrolling text), gift and like particle effects, and VIP entrance animations, built around an event-driven WebView2 capture pipeline and a thread-safe event bus. On top of this foundation we contribute an LLM broadcast automation framework comprising: (1) a per-song four-segment prompt scheduling system (T1 opening/transition, T2 empathy, T3 era story/production notes, T4 closing) that generates emotionally coherent radio-style commentary from lyric metadata; (2) a JSON-serializable RadioPersonaConfig schema supporting hot-swap multi-persona broadcasting; (3) a real-time danmaku quick-reaction engine with keyword routing to static urgent speech or LLM-generated empathetic responses; and (4) the Suwan Li AI singer-songwriter persona case study -- over 100 AI-generated songs produced with Suno. A 36-hour stress test demonstrates: zero danmaku overlap, zero deadlock crashes, gift effect P95 latency <= 180 ms, LLM-to-TTS segment P95 latency <= 2.1 s, and TTS integrated loudness gain of 9.5 LUFS. live streaming; danmaku; large language model; prompt engineering; virtual persona; WebView2; WINMM; TTS; Suno; loudness normalization; real-time scheduling</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='eess.IV'/>\n <category scheme='http://arxiv.org/schemas/atom' term='eess.AS'/>\n <published>2026-03-03T14:56:23Z</published>\n <arxiv:comment>14 pages, 13 figures, 6 tables, 7 algorithms, 16 references, submitted to ACM/IEEE International Conference on Systems and Software Engineering</arxiv:comment>\n <arxiv:primary_category term='eess.IV'/>\n <author>\n <name>Shuide Wen</name>\n </author>\n <author>\n <name>Sungil Seok</name>\n </author>\n <author>\n <name>Beier Ku</name>\n </author>\n <author>\n <name>Richee Li</name>\n </author>\n <author>\n <name>Yubin He</name>\n </author>\n <author>\n <name>Bowen Qu</name>\n </author>\n <author>\n <name>Yang Yang</name>\n </author>\n <author>\n <name>Ping Su</name>\n </author>\n <author>\n <name>Can Jiao</name>\n </author>\n </entry>"
}