Research

Paper

TESTING March 03, 2026

An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation

Authors

Zheyuan Kuang, Weiwei Jiang, Nicholas Koemel, Matthew Ahmadi, Emmanuel Stamatakis, Benjamin Tag, Anusha Withana, Zhanna Sarsenbayeva

Abstract

Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligned across modalities. We present an LLM-assisted toolkit that supports multimodal emotion data annotation through an inspectable, event centered workflow. The toolkit preprocesses and aligns heterogeneous recordings, visualizes all modalities on an interactive shared timeline, and renders structured signals as video tracks for cross modal consistency checks. It then detects candidate events and packages synchronized keyframes and time windows as event packets with traceable pointers to the source data. Finally, the toolkit integrates an LLM with modality specific tools and prompt templates to draft structured annotations for analyst verification and editing. We demonstrate the workflow on multimodal VR emotion recordings with representative examples.

Metadata

arXiv ID: 2603.02569
Provider: ARXIV
Primary Category: cs.HC
Published: 2026-03-03
Fetched: 2026-03-04 03:41

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.02569v1</id>\n    <title>An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation</title>\n    <updated>2026-03-03T03:42:32Z</updated>\n    <link href='https://arxiv.org/abs/2603.02569v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.02569v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligned across modalities. We present an LLM-assisted toolkit that supports multimodal emotion data annotation through an inspectable, event centered workflow. The toolkit preprocesses and aligns heterogeneous recordings, visualizes all modalities on an interactive shared timeline, and renders structured signals as video tracks for cross modal consistency checks. It then detects candidate events and packages synchronized keyframes and time windows as event packets with traceable pointers to the source data. Finally, the toolkit integrates an LLM with modality specific tools and prompt templates to draft structured annotations for analyst verification and editing. We demonstrate the workflow on multimodal VR emotion recordings with representative examples.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.HC'/>\n    <published>2026-03-03T03:42:32Z</published>\n    <arxiv:comment>5 pages, 1 figure</arxiv:comment>\n    <arxiv:primary_category term='cs.HC'/>\n    <author>\n      <name>Zheyuan Kuang</name>\n    </author>\n    <author>\n      <name>Weiwei Jiang</name>\n    </author>\n    <author>\n      <name>Nicholas Koemel</name>\n    </author>\n    <author>\n      <name>Matthew Ahmadi</name>\n    </author>\n    <author>\n      <name>Emmanuel Stamatakis</name>\n    </author>\n    <author>\n      <name>Benjamin Tag</name>\n    </author>\n    <author>\n      <name>Anusha Withana</name>\n    </author>\n    <author>\n      <name>Zhanna Sarsenbayeva</name>\n    </author>\n  </entry>"
}