Research

Paper

TESTING March 25, 2026

Exploring How Fair Model Representations Relate to Fair Recommendations

Authors

Bjørnar Vassøy, Benjamin Kille, Helge Langseth

Abstract

One of the many fairness definitions pursued in recent recommender system research targets mitigating demographic information encoded in model representations. Models optimized for this definition are typically evaluated on how well demographic attributes can be classified given model representations, with the (implicit) assumption that this measure accurately reflects \textit{recommendation parity}, i.e., how similar recommendations given to different users are. We challenge this assumption by comparing the amount of demographic information encoded in representations with various measures of how the recommendations differ. We propose two new approaches for measuring how well demographic information can be classified given ranked recommendations. Our results from extensive testing of multiple models on one real and multiple synthetically generated datasets indicate that optimizing for fair representations positively affects recommendation parity, but also that evaluation at the representation level is not a good proxy for measuring this effect when comparing models. We also provide extensive insight into how recommendation-level fairness metrics behave for various models by evaluating their performances on numerous generated datasets with different properties.

Metadata

arXiv ID: 2603.24396
Provider: ARXIV
Primary Category: cs.IR
Published: 2026-03-25
Fetched: 2026-03-26 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.24396v1</id>\n    <title>Exploring How Fair Model Representations Relate to Fair Recommendations</title>\n    <updated>2026-03-25T15:12:20Z</updated>\n    <link href='https://arxiv.org/abs/2603.24396v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.24396v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>One of the many fairness definitions pursued in recent recommender system research targets mitigating demographic information encoded in model representations. Models optimized for this definition are typically evaluated on how well demographic attributes can be classified given model representations, with the (implicit) assumption that this measure accurately reflects \\textit{recommendation parity}, i.e., how similar recommendations given to different users are. We challenge this assumption by comparing the amount of demographic information encoded in representations with various measures of how the recommendations differ. We propose two new approaches for measuring how well demographic information can be classified given ranked recommendations. Our results from extensive testing of multiple models on one real and multiple synthetically generated datasets indicate that optimizing for fair representations positively affects recommendation parity, but also that evaluation at the representation level is not a good proxy for measuring this effect when comparing models. We also provide extensive insight into how recommendation-level fairness metrics behave for various models by evaluating their performances on numerous generated datasets with different properties.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IR'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <published>2026-03-25T15:12:20Z</published>\n    <arxiv:comment>17 pages</arxiv:comment>\n    <arxiv:primary_category term='cs.IR'/>\n    <author>\n      <name>Bjørnar Vassøy</name>\n    </author>\n    <author>\n      <name>Benjamin Kille</name>\n    </author>\n    <author>\n      <name>Helge Langseth</name>\n    </author>\n  </entry>"
}