Research

Paper

TESTING February 26, 2026

A Comparative Study of Structural Representations for 2D Materials: Insights from Dynamic Collision Fingerprint and Matminer

Authors

Raphael M. Tromer, Isaac M. Felix, Rafael Besse, Marcelo L. Pereira Junior, Marcos G. E. da Luz

Abstract

In materials science, the selection of structural descriptors for machine learning protocols strongly influences predictive performance and the degree of physical interpretability that can be achieved from the derived models. Although more complex descriptors may improve numerical accuracy, they often represent extra computational load, also reducing transparency into the underlying structural information. A framework called the Dynamic Collision Fingerprint (DCF) was recently proposed with the goal of producing concise, physically significant representations, generating descriptors via dynamical probing of atomic structures. In this work, we benchmark DCF using a dataset composed of 120 two-dimensional carbon allotropes and compare its performance with the widely considered Matminer library. The analysis employs three regression models, linear regression, decision tree, and XGBoost, evaluated over train and test partitions ranging from 10\% to 90\% and repeated over multiple random seeds in order to characterize statistical variability. The obtained results demonstrate that DCF easily matches Matminer in terms of predicting accuracy across all learning algorithms. However, it accomplishes this using descriptors that are significantly lower dimensional, pointing to manageable computing costs. Moreover, compared to the rather technical Matminer descriptions, the DCF exhibits considerably clearer physical interpretability. These findings suggest that DCF is a significant substitute for high-dimensional descriptor libraries as structural representation since it is both computationally flexible and physically grounded.

Metadata

arXiv ID: 2602.22950
Provider: ARXIV
Primary Category: cond-mat.mtrl-sci
Published: 2026-02-26
Fetched: 2026-02-27 04:35

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.22950v1</id>\n    <title>A Comparative Study of Structural Representations for 2D Materials: Insights from Dynamic Collision Fingerprint and Matminer</title>\n    <updated>2026-02-26T12:42:56Z</updated>\n    <link href='https://arxiv.org/abs/2602.22950v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.22950v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>In materials science, the selection of structural descriptors for machine learning protocols strongly influences predictive performance and the degree of physical interpretability that can be achieved from the derived models. Although more complex descriptors may improve numerical accuracy, they often represent extra computational load, also reducing transparency into the underlying structural information. A framework called the Dynamic Collision Fingerprint (DCF) was recently proposed with the goal of producing concise, physically significant representations, generating descriptors via dynamical probing of atomic structures. In this work, we benchmark DCF using a dataset composed of 120 two-dimensional carbon allotropes and compare its performance with the widely considered Matminer library. The analysis employs three regression models, linear regression, decision tree, and XGBoost, evaluated over train and test partitions ranging from 10\\% to 90\\% and repeated over multiple random seeds in order to characterize statistical variability. The obtained results demonstrate that DCF easily matches Matminer in terms of predicting accuracy across all learning algorithms. However, it accomplishes this using descriptors that are significantly lower dimensional, pointing to manageable computing costs. Moreover, compared to the rather technical Matminer descriptions, the DCF exhibits considerably clearer physical interpretability. These findings suggest that DCF is a significant substitute for high-dimensional descriptor libraries as structural representation since it is both computationally flexible and physically grounded.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cond-mat.mtrl-sci'/>\n    <published>2026-02-26T12:42:56Z</published>\n    <arxiv:comment>19 pages, 04 figures, 01 table</arxiv:comment>\n    <arxiv:primary_category term='cond-mat.mtrl-sci'/>\n    <author>\n      <name>Raphael M. Tromer</name>\n    </author>\n    <author>\n      <name>Isaac M. Felix</name>\n    </author>\n    <author>\n      <name>Rafael Besse</name>\n    </author>\n    <author>\n      <name>Marcelo L. Pereira Junior</name>\n    </author>\n    <author>\n      <name>Marcos G. E. da Luz</name>\n    </author>\n  </entry>"
}