Leveraging Open-Access Libraries for Feature Engineering in Material Discovery

Mohammad Khatamirad1 *, Tiago J. Goncalves 2 , Edvin Fako2 , Sandip De, Raoul Naumann d’Alnoncourt1 , Michael Geske1 , Stephan A. Schunk2,3,4 , Sonja Schimmler5 , Frank Rosowski1,2

1BasCat – UniCat BASF JointLab, Technical University of Berlin, Berlin, Germany2BASF SE, Group Research, Ludwigshafen, Germany

3hte GmbH, Heidelberg, Germany

4 Institute of Chemical Technology, Universität Leipzig, Leipzig, Germany

5Fraunhofer FOKUS, Institute for Open Communication Systems, Berlin, Germany

*Corresponding Author: khatamirad@tu-berlin.de

Advances in machine learning (ML) and artificial intelligence (AI) are transforming material discovery. These methods significantly accelerate exploration of large feature spaces but often struggle with small datasets and researcher biases. Additionally, model development for multi-promoter catalyst systems is challenged by complex interaction between catalyst components, often requiring expensive ab-initio calculations. This in turn, hampers development of new descriptors for design of novel materials.In this work, we employ a data-driven approach tailored for small datasets that does not rely on prior knowledge of the studied system. Initially, an extended set of descriptors are generated through applying commutative operations to open-access atomistic properties. Additionally, interactions between catalyst components are accounted for through introducing intrinsic promoter properties such as energies of alloy and metal oxide formation. This method is applied exemplarily to study the complex RhMn+promoter/SiO2 catalyst system1,2, which is tested in high-throughput experimentation for syngas to ethanol (StE) reaction.The cross validation across multiple ML algorithms leads to a model with high accuracy. By leveraging only open-access material libraries, new descriptors are obtained which go beyond the mere correlation and provide insight into causation of observed performance trends. More importantly, the obtained model is capable of predicting new materials which were not used in training step. Experimental studies show very good agreement with model prediction, confirming an efficient workflow

for accelerated material discovery3 . Additionally, the respective data and metadata curation is carried out according to developed standards for digital catalysis research, as outlined by NFDI4Cat4 consortium. Keywords: Machine Learning, Predictive Modeling, Materials Science, Ethanol

References

1. Luk, H., Mondelli, C., Ferre, D. C., et al, Chem. Soc. Rev. (2017), 46

2. Khatamirad, M., Konrad, M., Gentzen, M., et al., MDPI Catal. (2022), 12, 1321.

3. Khatamirad, Fako, E., Boscagli, C., et al, RSC Catal. Sci. Technol. (2023), 13

4. NFDI4Cat, https://nfdi4cat.org/en/about-us/

DAYS

HOURS

MINUTES

SECONDS