Extraction of data from publications empowered by Kolmogorov-Arnold Networks

Back

Wenkai Ning 1*, Musen Li1,2 , Rika Kobayashi3 , Jeffrey R. Reimers 1,2

1International Centre for Quantum and Molecular Structures and Department of Physics, Shanghai
University, Shanghai, China. 
2School of Mathematical and Physical Sciences, University of Technology Sydney, Sydney, Australia. 
3Supercomputer Facility, Australian National University, Canberra, Australia. 
*Corresponding Author: ningwenkai@shu.edu.cn
Large Language Models (LLM) are used for large-scale extraction and organization of unstructured data owing to their exceptional natural language
processing capabilities. Empowering materials design, extensive data from experiments and simulations are scattered across numerous scientific publications, but high-quality experimental databases are lacking. We present an LLM approach that searches literature to create structured material property databases, overcoming previous limitations in integrating long contextual data and discerning complex inter-entity relationships by incorporating Kolmogorov-Arnold Networks (KAN). Our application organizes materials-bandgap data using learnable activation functions and spline-parametrized functions for dynamic categorization. The system learns from diverse sources by combining experimental results with simulation data, ensuring accuracy and efficiency. This KAN-based LLM demonstrates superior accuracy in organizing materials-bandgap data, with potential adaptability for various applications in materials science and other fields requiring structured data extraction. This integration has the potential to significantly enhance scientific research by improving data-driven discovery and contributing to technological and scientific progress. 
Keywords: Machine Learning, Large Language Models, Data Mining, Materials Science
References
1. Gupta, Tanishq, et al. "MatSciBERT: A materials domain language model for text mining and information extraction." npj Computational Materials 8.1 (2022): 102. 
2. Liu, Ziming, et al. "Kan: Kolmogorov-arnold networks." arXiv preprint arXiv:2404.19756 (2024). Bio: Wenkai Ning is a graduate student in the Department of Physics at Shanghai University, working together with the group of Professor Jeffrey Reimers. His primary interests are in artificial intelligence and machine learning applications for material science. He focuses on leveraging these technologies to accelerate materials design, aiming to bridge the gap between theoretical physics and practical data science. He is also involved in the organization and prediction of material properties from various data sources.
00
DAYS
00
HOURS
00
MINUTES
00
SECONDS

Important Dates

Online registration starts & first-round announcement
March 28, 2024
Abstract submission starts
May 1, 2024
Early bird registration closes & second-round announcement
July 1, 2024
Abstract submission closes
September 25, 2024
Workshop
October 9-13, 2024

Contact

Dr. Runhai Ouyang (DCTMD2024@163.com)

Organizer

WechatIMG34975.jpg图片1.pngWechatIMG3832.jpg

Partners and Sponsors

中德logo1.pngWechatIMG34976.jpgWechatIMG3381.jpgWechatIMG2879.jpgWechatIMG2875.jpgWechatIMG35956.jpg WechatIMG2128.jpgWechatIMG2206.jpg  WechatIMG3785.jpgWechatIMG2214.jpg