Research

Paper

TESTING March 18, 2026

ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws

Authors

Xuyang Cao, Qianying Liu, Chuan Xiao, Yusuke Oda, Pontus Stenetorp, Daisuke Kawahara, Makoto Onizuka, Sadao Kurohashi, Shuyuan Zheng

Abstract

In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multilingual scaling laws can predict the test loss under different language mixture ratios and can therefore be used to estimate the optimal ratios. However, the current approaches to multilingual scaling laws do not measure the \textit{cross-lingual transfer} effect, resulting in suboptimal mixture ratios. In this paper, we consider multilingual pretraining as a cooperative game in which each language acts as a player that jointly contributes to pretraining, gaining the resulting reduction in test loss as the payoff. Consequently, from the perspective of cooperative game theory, we quantify the cross-lingual transfer from each language by its contribution in the game, and propose a game-theoretic multilingual scaling law called \textit{ShapleyLaw}. Our experiments show that ShapleyLaw outperforms baseline methods in model performance prediction and language mixture optimization.

Metadata

arXiv ID: 2603.17945
Provider: ARXIV
Primary Category: cs.CL
Published: 2026-03-18
Fetched: 2026-03-19 06:01

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.17945v1</id>\n    <title>ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws</title>\n    <updated>2026-03-18T17:17:18Z</updated>\n    <link href='https://arxiv.org/abs/2603.17945v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.17945v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \\textit{language mixture ratios}. Multilingual scaling laws can predict the test loss under different language mixture ratios and can therefore be used to estimate the optimal ratios. However, the current approaches to multilingual scaling laws do not measure the \\textit{cross-lingual transfer} effect, resulting in suboptimal mixture ratios. In this paper, we consider multilingual pretraining as a cooperative game in which each language acts as a player that jointly contributes to pretraining, gaining the resulting reduction in test loss as the payoff. Consequently, from the perspective of cooperative game theory, we quantify the cross-lingual transfer from each language by its contribution in the game, and propose a game-theoretic multilingual scaling law called \\textit{ShapleyLaw}. Our experiments show that ShapleyLaw outperforms baseline methods in model performance prediction and language mixture optimization.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-03-18T17:17:18Z</published>\n    <arxiv:comment>18 pages</arxiv:comment>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Xuyang Cao</name>\n    </author>\n    <author>\n      <name>Qianying Liu</name>\n    </author>\n    <author>\n      <name>Chuan Xiao</name>\n    </author>\n    <author>\n      <name>Yusuke Oda</name>\n    </author>\n    <author>\n      <name>Pontus Stenetorp</name>\n    </author>\n    <author>\n      <name>Daisuke Kawahara</name>\n    </author>\n    <author>\n      <name>Makoto Onizuka</name>\n    </author>\n    <author>\n      <name>Sadao Kurohashi</name>\n    </author>\n    <author>\n      <name>Shuyuan Zheng</name>\n    </author>\n  </entry>"
}