Research

Paper

TESTING March 25, 2026

What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification

Authors

Massa Baali, Sarthak Bisht, Rita Singh, Bhiksha Raj

Abstract

Speaker verification at large scale remains an open challenge as fixed-margin losses treat all samples equally regardless of quality. We hypothesize that mislabeled or degraded samples introduce noisy gradients that disrupt compact speaker manifolds. We propose Curry (CURriculum Ranking), an adaptive loss that estimates sample difficulty online via Sub-center ArcFace: confidence scores from dominant sub-center cosine similarity rank samples into easy, medium, and hard tiers using running batch statistics, without auxiliary annotations. Learnable weights guide the model from stable identity foundations through manifold refinement to boundary sharpening. To our knowledge, this is the largest-scale speaker verification system trained to date. Evaluated on VoxCeleb1-O, and SITW, Curry reduces EER by 86.8\% and 60.0\% over the Sub-center ArcFace baseline, establishing a new paradigm for robust speaker verification on imperfect large-scale data.

Metadata

arXiv ID: 2603.24432

Provider: ARXIV

Primary Category: cs.SD

Published: 2026-03-25

Fetched: 2026-03-26 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.24432v1</id>\n    <title>What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification</title>\n    <updated>2026-03-25T15:41:21Z</updated>\n    <link href='https://arxiv.org/abs/2603.24432v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.24432v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Speaker verification at large scale remains an open challenge as fixed-margin losses treat all samples equally regardless of quality. We hypothesize that mislabeled or degraded samples introduce noisy gradients that disrupt compact speaker manifolds. We propose Curry (CURriculum Ranking), an adaptive loss that estimates sample difficulty online via Sub-center ArcFace: confidence scores from dominant sub-center cosine similarity rank samples into easy, medium, and hard tiers using running batch statistics, without auxiliary annotations. Learnable weights guide the model from stable identity foundations through manifold refinement to boundary sharpening. To our knowledge, this is the largest-scale speaker verification system trained to date. Evaluated on VoxCeleb1-O, and SITW, Curry reduces EER by 86.8\\% and 60.0\\% over the Sub-center ArcFace baseline, establishing a new paradigm for robust speaker verification on imperfect large-scale data.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SD'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-03-25T15:41:21Z</published>\n    <arxiv:primary_category term='cs.SD'/>\n    <author>\n      <name>Massa Baali</name>\n    </author>\n    <author>\n      <name>Sarthak Bisht</name>\n    </author>\n    <author>\n      <name>Rita Singh</name>\n    </author>\n    <author>\n      <name>Bhiksha Raj</name>\n    </author>\n  </entry>"
}