Extraction of data from publications empowered by Kolmogorov-Arnold Networks
Back
Wenkai Ning 1*, Musen Li1,2 , Rika Kobayashi3 , Jeffrey R. Reimers 1,2
1International Centre for Quantum and Molecular Structures and Department of Physics, Shanghai
University, Shanghai, China.
2School of Mathematical and Physical Sciences, University of Technology Sydney, Sydney, Australia.
3Supercomputer Facility, Australian National University, Canberra, Australia.
*Corresponding Author: ningwenkai@shu.edu.cn
Large Language Models (LLM) are used for large-scale extraction and organization of unstructured data owing to their exceptional natural language
processing capabilities. Empowering materials design, extensive data from experiments and simulations are scattered across numerous scientific publications, but high-quality experimental databases are lacking. We present an LLM approach that searches literature to create structured material property databases, overcoming previous limitations in integrating long contextual data and discerning complex inter-entity relationships by incorporating Kolmogorov-Arnold Networks (KAN). Our application organizes materials-bandgap data using learnable activation functions and spline-parametrized functions for dynamic categorization. The system learns from diverse sources by combining experimental results with simulation data, ensuring accuracy and efficiency. This KAN-based LLM demonstrates superior accuracy in organizing materials-bandgap data, with potential adaptability for various applications in materials science and other fields requiring structured data extraction. This integration has the potential to significantly enhance scientific research by improving data-driven discovery and contributing to technological and scientific progress.
Keywords: Machine Learning, Large Language Models, Data Mining, Materials Science
References
1. Gupta, Tanishq, et al. "MatSciBERT: A materials domain language model for text mining and information extraction." npj Computational Materials 8.1 (2022): 102.
2. Liu, Ziming, et al. "Kan: Kolmogorov-arnold networks." arXiv preprint arXiv:2404.19756 (2024). Bio: Wenkai Ning is a graduate student in the Department of Physics at Shanghai University, working together with the group of Professor Jeffrey Reimers. His primary interests are in artificial intelligence and machine learning applications for material science. He focuses on leveraging these technologies to accelerate materials design, aiming to bridge the gap between theoretical physics and practical data science. He is also involved in the organization and prediction of material properties from various data sources.