Research

Paper

AI LLM March 05, 2026

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Authors

Yilin Jiang, Fei Tan, Xuanyu Yin, Jing Leng, Aimin Zhou

Abstract

Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI

Metadata

arXiv ID: 2603.04855
Provider: ARXIV
Primary Category: cs.CL
Published: 2026-03-05
Fetched: 2026-03-06 14:20

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.04855v1</id>\n    <title>HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents</title>\n    <updated>2026-03-05T06:20:41Z</updated>\n    <link href='https://arxiv.org/abs/2603.04855v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.04855v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-03-05T06:20:41Z</published>\n    <arxiv:comment>46 pages, 7 figures, submitted to ACL2026</arxiv:comment>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Yilin Jiang</name>\n    </author>\n    <author>\n      <name>Fei Tan</name>\n    </author>\n    <author>\n      <name>Xuanyu Yin</name>\n    </author>\n    <author>\n      <name>Jing Leng</name>\n    </author>\n    <author>\n      <name>Aimin Zhou</name>\n    </author>\n  </entry>"
}