Paper
An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation
Authors
Zheyuan Kuang, Weiwei Jiang, Nicholas Koemel, Matthew Ahmadi, Emmanuel Stamatakis, Benjamin Tag, Anusha Withana, Zhanna Sarsenbayeva
Abstract
Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligned across modalities. We present an LLM-assisted toolkit that supports multimodal emotion data annotation through an inspectable, event centered workflow. The toolkit preprocesses and aligns heterogeneous recordings, visualizes all modalities on an interactive shared timeline, and renders structured signals as video tracks for cross modal consistency checks. It then detects candidate events and packages synchronized keyframes and time windows as event packets with traceable pointers to the source data. Finally, the toolkit integrates an LLM with modality specific tools and prompt templates to draft structured annotations for analyst verification and editing. We demonstrate the workflow on multimodal VR emotion recordings with representative examples.
Metadata
Related papers
Fractal universe and quantum gravity made simple
Fabio Briscese, Gianluca Calcagni • 2026-03-25
POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan
Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25
Orientation Reconstruction of Proteins using Coulomb Explosions
Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.02569v1</id>\n <title>An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation</title>\n <updated>2026-03-03T03:42:32Z</updated>\n <link href='https://arxiv.org/abs/2603.02569v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.02569v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligned across modalities. We present an LLM-assisted toolkit that supports multimodal emotion data annotation through an inspectable, event centered workflow. The toolkit preprocesses and aligns heterogeneous recordings, visualizes all modalities on an interactive shared timeline, and renders structured signals as video tracks for cross modal consistency checks. It then detects candidate events and packages synchronized keyframes and time windows as event packets with traceable pointers to the source data. Finally, the toolkit integrates an LLM with modality specific tools and prompt templates to draft structured annotations for analyst verification and editing. We demonstrate the workflow on multimodal VR emotion recordings with representative examples.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.HC'/>\n <published>2026-03-03T03:42:32Z</published>\n <arxiv:comment>5 pages, 1 figure</arxiv:comment>\n <arxiv:primary_category term='cs.HC'/>\n <author>\n <name>Zheyuan Kuang</name>\n </author>\n <author>\n <name>Weiwei Jiang</name>\n </author>\n <author>\n <name>Nicholas Koemel</name>\n </author>\n <author>\n <name>Matthew Ahmadi</name>\n </author>\n <author>\n <name>Emmanuel Stamatakis</name>\n </author>\n <author>\n <name>Benjamin Tag</name>\n </author>\n <author>\n <name>Anusha Withana</name>\n </author>\n <author>\n <name>Zhanna Sarsenbayeva</name>\n </author>\n </entry>"
}