Paper
PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents
Authors
Minjia Wang, Yunfeng Wang, Xiao Ma, Dexin Lv, Qifan Guo, Lynn Zheng, Benliang Wang, Lei Wang, Jiannan Li, Yongwei Xing, David Xu, Zheng Sun
Abstract
Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.
Metadata
Related papers
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30
Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books
Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30
ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining
Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30
RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems
Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.11955v1</id>\n <title>PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents</title>\n <updated>2026-03-12T14:02:24Z</updated>\n <link href='https://arxiv.org/abs/2603.11955v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.11955v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n <published>2026-03-12T14:02:24Z</published>\n <arxiv:comment>EACL 2026 Industry Track</arxiv:comment>\n <arxiv:primary_category term='cs.CL'/>\n <author>\n <name>Minjia Wang</name>\n </author>\n <author>\n <name>Yunfeng Wang</name>\n </author>\n <author>\n <name>Xiao Ma</name>\n </author>\n <author>\n <name>Dexin Lv</name>\n </author>\n <author>\n <name>Qifan Guo</name>\n </author>\n <author>\n <name>Lynn Zheng</name>\n </author>\n <author>\n <name>Benliang Wang</name>\n </author>\n <author>\n <name>Lei Wang</name>\n </author>\n <author>\n <name>Jiannan Li</name>\n </author>\n <author>\n <name>Yongwei Xing</name>\n </author>\n <author>\n <name>David Xu</name>\n </author>\n <author>\n <name>Zheng Sun</name>\n </author>\n </entry>"
}