Language Data-Driven Machine Learning Design of New Materials
Lei Zhang1*
1Department of Materials Physics, School of Chemistry and Materials Science, Nanjing University of Information Science and Technology, Nanjing, China
*Corresponding Author: 002699@nuist.edu.cn
Data-driven methods based on language models (LM) and machine learning have attracted the attention of materials scientists for designing and analyzing new materials within a highly complex virtual design space. This presentation will discuss how language models and data-driven methods can be applied to explore new functional materials. I will highlight recent progress in our group on data-driven materials design and prediction (e.g., photovoltaic and halide perovskite materials), with an emphasis on different data types and sources, particularly textual data using natural language processing (NLP) techniques and language models. The presentation will cover the data-driven materials design workflow involving high-throughput computation/experimentation, data mining, traditional machine learning, genetic algorithms, first-principles calculations, and molecular dynamics. A multimodal data-driven approach that combines language models, density functional theory, and genetic algorithms will be emphasized. Additionally, I will report on the development and applications of NJmat, our data-driven and artificial intelligence software for materials science, which is particularly user-friendly for experimentalists.
Keywords: Data-Driven, Language Model, Machine Learning, First-Principles Calculation
References
1. Zhang, L.; Huang, Y.; Yan, L.; Ge, J.; Ma, X.; Liu, Z.; You, J.; Jen, A. K. Y.; Frank Liu, S. Fast Exploring Literature by Language Machine Learning for Perovskite Solar Cell Materials Design. Adv. Intell. Syst. 6, 6 (2024)
2. Wang, S.; Huang, Y.; Hu, W.; Zhang, L.* Data-Driven Optimization and Machine Learning Analysis of Compatible Molecules for Halide Perovskite Material. npj Comput. Mater. 10 (1), 114 (2024)
3. Zhang, L.; He, M.; Huang, E.; Ma, X.; You, J.; Jen, A. K. Y.; Liu, S. Overcoming Language Barrier for Scientific Studies via Unsupervised Literature Learning: Case Study on Solar Cell Materials Prediction. Sol. RRL 8, 10 (2024)
4. Zhang, L.*; He, M.; Shao, S. Machine Learning for Halide Perovskite Materials. Nano Energy 78, 105380 (2020)
Dr. Runhai Ouyang (DCTMD2024@163.com)