Paper
A Comparative Study of Structural Representations for 2D Materials: Insights from Dynamic Collision Fingerprint and Matminer
Authors
Raphael M. Tromer, Isaac M. Felix, Rafael Besse, Marcelo L. Pereira Junior, Marcos G. E. da Luz
Abstract
In materials science, the selection of structural descriptors for machine learning protocols strongly influences predictive performance and the degree of physical interpretability that can be achieved from the derived models. Although more complex descriptors may improve numerical accuracy, they often represent extra computational load, also reducing transparency into the underlying structural information. A framework called the Dynamic Collision Fingerprint (DCF) was recently proposed with the goal of producing concise, physically significant representations, generating descriptors via dynamical probing of atomic structures. In this work, we benchmark DCF using a dataset composed of 120 two-dimensional carbon allotropes and compare its performance with the widely considered Matminer library. The analysis employs three regression models, linear regression, decision tree, and XGBoost, evaluated over train and test partitions ranging from 10\% to 90\% and repeated over multiple random seeds in order to characterize statistical variability. The obtained results demonstrate that DCF easily matches Matminer in terms of predicting accuracy across all learning algorithms. However, it accomplishes this using descriptors that are significantly lower dimensional, pointing to manageable computing costs. Moreover, compared to the rather technical Matminer descriptions, the DCF exhibits considerably clearer physical interpretability. These findings suggest that DCF is a significant substitute for high-dimensional descriptor libraries as structural representation since it is both computationally flexible and physically grounded.
Metadata
Related papers
Fractal universe and quantum gravity made simple
Fabio Briscese, Gianluca Calcagni • 2026-03-25
POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan
Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25
Orientation Reconstruction of Proteins using Coulomb Explosions
Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2602.22950v1</id>\n <title>A Comparative Study of Structural Representations for 2D Materials: Insights from Dynamic Collision Fingerprint and Matminer</title>\n <updated>2026-02-26T12:42:56Z</updated>\n <link href='https://arxiv.org/abs/2602.22950v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2602.22950v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>In materials science, the selection of structural descriptors for machine learning protocols strongly influences predictive performance and the degree of physical interpretability that can be achieved from the derived models. Although more complex descriptors may improve numerical accuracy, they often represent extra computational load, also reducing transparency into the underlying structural information. A framework called the Dynamic Collision Fingerprint (DCF) was recently proposed with the goal of producing concise, physically significant representations, generating descriptors via dynamical probing of atomic structures. In this work, we benchmark DCF using a dataset composed of 120 two-dimensional carbon allotropes and compare its performance with the widely considered Matminer library. The analysis employs three regression models, linear regression, decision tree, and XGBoost, evaluated over train and test partitions ranging from 10\\% to 90\\% and repeated over multiple random seeds in order to characterize statistical variability. The obtained results demonstrate that DCF easily matches Matminer in terms of predicting accuracy across all learning algorithms. However, it accomplishes this using descriptors that are significantly lower dimensional, pointing to manageable computing costs. Moreover, compared to the rather technical Matminer descriptions, the DCF exhibits considerably clearer physical interpretability. These findings suggest that DCF is a significant substitute for high-dimensional descriptor libraries as structural representation since it is both computationally flexible and physically grounded.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cond-mat.mtrl-sci'/>\n <published>2026-02-26T12:42:56Z</published>\n <arxiv:comment>19 pages, 04 figures, 01 table</arxiv:comment>\n <arxiv:primary_category term='cond-mat.mtrl-sci'/>\n <author>\n <name>Raphael M. Tromer</name>\n </author>\n <author>\n <name>Isaac M. Felix</name>\n </author>\n <author>\n <name>Rafael Besse</name>\n </author>\n <author>\n <name>Marcelo L. Pereira Junior</name>\n </author>\n <author>\n <name>Marcos G. E. da Luz</name>\n </author>\n </entry>"
}