Research

Paper

TESTING March 12, 2026

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Authors

Haodong Zhao, Jinming Hu, Yijie Bai, Tian Dong, Wei Du, Zhuosheng Zhang, Yanjiao Chen, Haojin Zhu, Gongshen Liu

Abstract

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).

Metadata

arXiv ID: 2603.12089

Provider: ARXIV

Primary Category: cs.CR

Published: 2026-03-12

Fetched: 2026-03-13 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.12089v1</id>\n    <title>EmbTracker: Traceable Black-box Watermarking for Federated Language Models</title>\n    <updated>2026-03-12T15:57:32Z</updated>\n    <link href='https://arxiv.org/abs/2603.12089v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.12089v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\\%).</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CR'/>\n    <published>2026-03-12T15:57:32Z</published>\n    <arxiv:comment>Work in progress</arxiv:comment>\n    <arxiv:primary_category term='cs.CR'/>\n    <author>\n      <name>Haodong Zhao</name>\n    </author>\n    <author>\n      <name>Jinming Hu</name>\n    </author>\n    <author>\n      <name>Yijie Bai</name>\n    </author>\n    <author>\n      <name>Tian Dong</name>\n    </author>\n    <author>\n      <name>Wei Du</name>\n    </author>\n    <author>\n      <name>Zhuosheng Zhang</name>\n    </author>\n    <author>\n      <name>Yanjiao Chen</name>\n    </author>\n    <author>\n      <name>Haojin Zhu</name>\n    </author>\n    <author>\n      <name>Gongshen Liu</name>\n    </author>\n  </entry>"
}