Paper
Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?
Authors
Xin Wang, Ge Wanying, Junichi Yamagishi
Abstract
Building speech deepfake detection models that are generalizable to unseen attacks remains a challenging problem. Although the field has shifted toward a pre-training and fine-tuning paradigm using speech foundation models, most approaches rely solely on supervised fine-tuning (SFT). Inspired by the field of large language models, wherein reinforcement learning (RL) is used for model fine-tuning, we investigate the impact of RL, specifically Group Relative Policy Optimization (GRPO). The results from experiments using multiple detectors and test sets indicate that pure GRPO-based fine-tuning improves performance on out-of-domain test sets while maintaining performance on target-domain test data. This approach outperforms both SFT-only and hybrid setups. Our ablation studies further suggest that the negative reward in GRPO may be a key factor in this improvement.
Metadata
Related papers
Fractal universe and quantum gravity made simple
Fabio Briscese, Gianluca Calcagni • 2026-03-25
POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan
Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25
Orientation Reconstruction of Proteins using Coulomb Explosions
Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.02914v1</id>\n <title>Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?</title>\n <updated>2026-03-03T12:13:53Z</updated>\n <link href='https://arxiv.org/abs/2603.02914v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.02914v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Building speech deepfake detection models that are generalizable to unseen attacks remains a challenging problem. Although the field has shifted toward a pre-training and fine-tuning paradigm using speech foundation models, most approaches rely solely on supervised fine-tuning (SFT). Inspired by the field of large language models, wherein reinforcement learning (RL) is used for model fine-tuning, we investigate the impact of RL, specifically Group Relative Policy Optimization (GRPO). The results from experiments using multiple detectors and test sets indicate that pure GRPO-based fine-tuning improves performance on out-of-domain test sets while maintaining performance on target-domain test data. This approach outperforms both SFT-only and hybrid setups. Our ablation studies further suggest that the negative reward in GRPO may be a key factor in this improvement.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='eess.AS'/>\n <published>2026-03-03T12:13:53Z</published>\n <arxiv:comment>Submitted to Interspeech 2026; put on arxiv based on requirement of paper open-access rule; quote from Interspeech: \"Interspeech no longer enforces an anonymity period for submissions. While uploading a version online is permitted, your official submission to Interspeech must not contain any author-identifying information\"</arxiv:comment>\n <arxiv:primary_category term='eess.AS'/>\n <author>\n <name>Xin Wang</name>\n </author>\n <author>\n <name>Ge Wanying</name>\n </author>\n <author>\n <name>Junichi Yamagishi</name>\n </author>\n </entry>"
}