Research

Paper

TESTING February 27, 2026

Hierarchical Concept-based Interpretable Models

Authors

Oscar Hill, Mateo Espinosa Zarlenga, Mateja Jamnik

Abstract

Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEMs) address this by mapping inputs to human-interpretable concept representations from which tasks can be predicted. Yet, CEMs fail to represent inter-concept relationships and require concept annotations at different granularities during training, limiting their applicability. In this paper, we introduce Hierarchical Concept Embedding Models (HiCEMs), a new family of CEMs that explicitly model concept relationships through hierarchical structures. To enable HiCEMs in real-world settings, we propose Concept Splitting, a method for automatically discovering finer-grained sub-concepts from a pretrained CEM's embedding space without requiring additional annotations. This allows HiCEMs to generate fine-grained explanations from limited concept labels, reducing annotation burdens. Our evaluation across multiple datasets, including a user study and experiments on PseudoKitchens, a newly proposed concept-based dataset of 3D kitchen renders, demonstrates that (1) Concept Splitting discovers human-interpretable sub-concepts absent during training that can be used to train highly accurate HiCEMs, and (2) HiCEMs enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.

Metadata

arXiv ID: 2602.23947
Provider: ARXIV
Primary Category: cs.LG
Published: 2026-02-27
Fetched: 2026-03-02 06:04

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.23947v1</id>\n    <title>Hierarchical Concept-based Interpretable Models</title>\n    <updated>2026-02-27T11:49:56Z</updated>\n    <link href='https://arxiv.org/abs/2602.23947v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.23947v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEMs) address this by mapping inputs to human-interpretable concept representations from which tasks can be predicted. Yet, CEMs fail to represent inter-concept relationships and require concept annotations at different granularities during training, limiting their applicability. In this paper, we introduce Hierarchical Concept Embedding Models (HiCEMs), a new family of CEMs that explicitly model concept relationships through hierarchical structures. To enable HiCEMs in real-world settings, we propose Concept Splitting, a method for automatically discovering finer-grained sub-concepts from a pretrained CEM's embedding space without requiring additional annotations. This allows HiCEMs to generate fine-grained explanations from limited concept labels, reducing annotation burdens. Our evaluation across multiple datasets, including a user study and experiments on PseudoKitchens, a newly proposed concept-based dataset of 3D kitchen renders, demonstrates that (1) Concept Splitting discovers human-interpretable sub-concepts absent during training that can be used to train highly accurate HiCEMs, and (2) HiCEMs enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-27T11:49:56Z</published>\n    <arxiv:comment>Published as a conference paper at ICLR 2026</arxiv:comment>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Oscar Hill</name>\n    </author>\n    <author>\n      <name>Mateo Espinosa Zarlenga</name>\n    </author>\n    <author>\n      <name>Mateja Jamnik</name>\n    </author>\n  </entry>"
}