Research

Paper

AI LLM March 03, 2026

Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge

Authors

Dhanya E, Ankita Meena, Manas Nanivadekar, Noumida A, Victor Azad, Ashwini Nagaraj Shenoy, Pratik Roy Chowdhuri, Shobhit Banga, Vanshika Chhabra, Chitralekha Bhat, Shareef babu Kalluri, Srikanth Raj Chetupalli, Deepu Vijayasenan, Sriram Ganapathy

Abstract

The DIarization and Speech Processing for LAnguage understanding in Conversational Environments - Medical (DISPLACE-M) challenge introduces a conversational AI benchmark focused on understanding goal-oriented, real-world medical dialogues collected in the field. The challenge addresses multi-speaker interactions between healthcare workers and seekers characterized by spontaneous, noisy and overlapping speech across Indian languages and dialects. As part of the challenge, medical conversational dataset comprising 25 hours of development data and 10 hours of blind evaluation recordings was released. We provided baseline systems within a unified end-to-end pipeline across 4 tasks - speaker diarization, automatic speech recognition, topic identification and dialogue summarization - to enable consistent benchmarking. System performance is evaluated using established metrics such as diarization error rate (DER), time-constrained minimum-permutation word error rate (tcpWER), and ROUGE-L. During this evaluation (Phase-I), 12 teams, across the globe, actively participated pushing the baseline systems on these metrics. However, even with a 6-8 week dedicated effort from various participants, the task is shown to be substantially challenging, and the existing systems are significantly short of healthcare deployment readiness.

Metadata

arXiv ID: 2603.02813

Provider: ARXIV

Primary Category: eess.AS

Published: 2026-03-03

Fetched: 2026-03-04 03:41

Related papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.02813v1</id>\n    <title>Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge</title>\n    <updated>2026-03-03T10:04:02Z</updated>\n    <link href='https://arxiv.org/abs/2603.02813v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.02813v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The DIarization and Speech Processing for LAnguage understanding in Conversational Environments - Medical (DISPLACE-M) challenge introduces a conversational AI benchmark focused on understanding goal-oriented, real-world medical dialogues collected in the field. The challenge addresses multi-speaker interactions between healthcare workers and seekers characterized by spontaneous, noisy and overlapping speech across Indian languages and dialects. As part of the challenge, medical conversational dataset comprising 25 hours of development data and 10 hours of blind evaluation recordings was released. We provided baseline systems within a unified end-to-end pipeline across 4 tasks - speaker diarization, automatic speech recognition, topic identification and dialogue summarization - to enable consistent benchmarking. System performance is evaluated using established metrics such as diarization error rate (DER), time-constrained minimum-permutation word error rate (tcpWER), and ROUGE-L. During this evaluation (Phase-I), 12 teams, across the globe, actively participated pushing the baseline systems on these metrics. However, even with a 6-8 week dedicated effort from various participants, the task is shown to be substantially challenging, and the existing systems are significantly short of healthcare deployment readiness.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='eess.AS'/>\n    <published>2026-03-03T10:04:02Z</published>\n    <arxiv:comment>Submitted for review to Interspeech 2026</arxiv:comment>\n    <arxiv:primary_category term='eess.AS'/>\n    <author>\n      <name>Dhanya E</name>\n    </author>\n    <author>\n      <name>Ankita Meena</name>\n    </author>\n    <author>\n      <name>Manas Nanivadekar</name>\n    </author>\n    <author>\n      <name>Noumida A</name>\n    </author>\n    <author>\n      <name>Victor Azad</name>\n    </author>\n    <author>\n      <name>Ashwini Nagaraj Shenoy</name>\n    </author>\n    <author>\n      <name>Pratik Roy Chowdhuri</name>\n    </author>\n    <author>\n      <name>Shobhit Banga</name>\n    </author>\n    <author>\n      <name>Vanshika Chhabra</name>\n    </author>\n    <author>\n      <name>Chitralekha Bhat</name>\n    </author>\n    <author>\n      <name>Shareef babu Kalluri</name>\n    </author>\n    <author>\n      <name>Srikanth Raj Chetupalli</name>\n    </author>\n    <author>\n      <name>Deepu Vijayasenan</name>\n    </author>\n    <author>\n      <name>Sriram Ganapathy</name>\n    </author>\n  </entry>"
}