Research

Paper

AI LLM March 10, 2026

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health

Authors

Trung Hieu Ngo, Adrien Bazoge, Solen Quiniou, Pierre-Antoine Gourraud, Emmanuel Morin

Abstract

Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks, but they often propagate biases embedded in their training data, which is potentially impactful in sensitive domains like healthcare. While existing benchmarks evaluate biases related to individual social determinants of health (SDoH) such as gender or ethnicity, they often overlook interactions between these factors and lack context-specific assessments. This study investigates bias in LLMs by probing the relationships between gender and other SDoH in French patient records. Through a series of experiments, we found that embedded stereotypes can be probed using SDoH input and that LLMs rely on embedded stereotypes to make gendered decisions, suggesting that evaluating interactions among SDoH factors could usefully complement existing approaches to assessing LLM performance and bias.

Metadata

arXiv ID: 2603.09416
Provider: ARXIV
Primary Category: cs.CL
Published: 2026-03-10
Fetched: 2026-03-11 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.09416v1</id>\n    <title>Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health</title>\n    <updated>2026-03-10T09:30:10Z</updated>\n    <link href='https://arxiv.org/abs/2603.09416v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.09416v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks, but they often propagate biases embedded in their training data, which is potentially impactful in sensitive domains like healthcare. While existing benchmarks evaluate biases related to individual social determinants of health (SDoH) such as gender or ethnicity, they often overlook interactions between these factors and lack context-specific assessments. This study investigates bias in LLMs by probing the relationships between gender and other SDoH in French patient records. Through a series of experiments, we found that embedded stereotypes can be probed using SDoH input and that LLMs rely on embedded stereotypes to make gendered decisions, suggesting that evaluating interactions among SDoH factors could usefully complement existing approaches to assessing LLM performance and bias.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-03-10T09:30:10Z</published>\n    <arxiv:comment>Accepted as Findings at EACL 2026</arxiv:comment>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Trung Hieu Ngo</name>\n    </author>\n    <author>\n      <name>Adrien Bazoge</name>\n    </author>\n    <author>\n      <name>Solen Quiniou</name>\n    </author>\n    <author>\n      <name>Pierre-Antoine Gourraud</name>\n    </author>\n    <author>\n      <name>Emmanuel Morin</name>\n    </author>\n  </entry>"
}