A Large Multi-Modality Model for Chemistry and Materials Science

Back

Zihan Zhao1, Bo Chen1,2, Jinbiao Li1,2, Da Ma, 1 Lu Chen1,2*, Kai Yu1,2*, Xin Chen2* 

1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 

2Suzhou Laboratory, Suzhou, China

*E-mail: chenlusz@sjtu.edu.cn; kai.yu@sjtu.edu.cn; mail.xinchen@gmail.com 

Rapid developments of AI tools are expected to offer unprecedented assistance to the research of chemistry and materials science. However, neither existing task-specific models nor emerging general large language models (LLM) can cover the wide range of data modality and task categories. The specialized language and knowledge used in the field including various forms of molecular presentations and spectroscopic methods, hinders the performance of general-domain LLMs in the disciplines. 

   We first developed a 13B LLM trained on 34B tokens from chemical literature, textbooks, and instructions. The resulting model, ChemDFM1, can store, understand, and reason over chemical knowledge while still possessing generic language comprehension capabilities. In our quantitative evaluation, ChemDFM surpasses GPT-4 on most chemical tasks, despite the significant size difference. In an extensive third-party test2, ChemDFM significantly outperforms most of representative open-sourced LLMs. 

   We further developed a multi-modal LLM for chemistry and materials science: ChemDFM-X. Diverse multimodal data includes SMILES, GNN, mass spectroscopy and IR spectroscopy, etc, generating a large domain-specific training corpora containing 7.6M data. ChemDFM-X is evaluated on extensive experiments of various cross-modality tasks. The results demonstrate the great potential of ChemDFM-X in inter-modal knowledge comprehension. 

This study illustrates the potential of LLM as a co-scientist in the general area of chemistry and materials science tasks. A few examples using ChemDFM-X to assist material research will be demonstrated.

Keywords: Multi-modality, Large Language Model, Spectroscopy, Materials Science

References

1. Zhao, Z.H. et al.. "ChemDFM: Dialogue Foundation Model for Chemistry."  https://arxiv.org/abs/2401.14818

2. Feng, K.H. et al.. "SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models.” https://arxiv.org/pdf/2406.09098.

00
DAYS
00
HOURS
00
MINUTES
00
SECONDS

Important Dates

Online registration starts & first-round announcement
March 28, 2024
Abstract submission starts
May 1, 2024
Early bird registration closes & second-round announcement
July 1, 2024
Abstract submission closes
September 25, 2024
Workshop
October 9-13, 2024

Contact

Dr. Runhai Ouyang (DCTMD2024@163.com)

Organizer

WechatIMG34975.jpg图片1.pngWechatIMG3832.jpg

Partners and Sponsors

中德logo1.pngWechatIMG34976.jpgWechatIMG3381.jpgWechatIMG2879.jpgWechatIMG2875.jpgWechatIMG35956.jpg WechatIMG2128.jpgWechatIMG2206.jpg  WechatIMG3785.jpgWechatIMG2214.jpg