By Levi McGarry, College of Arts and Sciences
While artificial intelligence (AI) and machine learning capabilities continue to grow and expand, geochemistry researchers from Washington State University are exploring how to employ these technologies for soil rehabilitation and geochemical exploration.
Juejing Liu, a postdoctoral researcher in chemistry, and his mentor, Associate Professor Xiaofeng Guo, recently received a Microsoft “AI for Good” Award, which was given to twenty innovative AI-based projects across the state of Washington. Alongside other collaborators, the duo is working to develop a text data mining (TDM) system to build a publicly accessible dataset that will aid in developing effective soil decontamination methods and provide insights into the domestic production of critical and rare earth elements.
Traditionally, datasets summarizing the conditions and properties for certain geochemical compounds were developed manually through grueling literature reviews. Not only were these compilations time-consuming, but the datasets were also prone to human errors and incorrect calculations. Emerging generative AI tools present a new opportunity in chemical research and materials sciences, introducing the ability to predict compound stabilities and thermodynamic properties among various inorganic elements.
“We want to build a dataset used for predicting the thermodynamics of various ligand-containment metal complexes,” said Liu. “The goal for this machine learning model is that once we’ve trained it, it can rapidly screen potential ligands which can bind heavy metal contaminants more strongly and for a longer duration, aiding in containment immobilization,” said Liu.
This chemical binding process, called complexation, can affect how heavy metals interact with soil, water, and organic organisms. The ability to predict complexation reactions between heavy metals and other compounds is used in both environmental rehabilitation and critical element extraction. Complexation-based strategies are used to selectively extract and purify metals from ores, and specific chemical agents can remove heavy metals from contaminated sites during environmental restoration efforts.
Once the dataset is complete, Liu and Guo are planning to use the machine-learning model to help identify soil contaminating compounds at the Hanford Nuclear Site and along the Spokane River. While the Hanford Nuclear Site is infamous for producing plutonium during the Manhattan Project, pollution in the Spokane River system can be traced back to the dominance of silver mining in Idaho’s Silver Valley during the nineteenth and twentieth centuries, which deposited quantities of lead, arsenic, and other toxic metals into the watershed. Pollutants at both locations can be remediated through applied geochemistry and matching the right complexation strategy to the area pollutants.
While the concepts are commonly used in geochemistry, the emergent computing ability of AI poses both opportunities and questions for the researchers.
“The greatest challenge so far is that, to be honest, we are chemists but not data scientists,” said Liu. “While we know the foundations of machine learning and the potential of AI for chemistry, the great thing is that we have a lot of very talented undergraduates majoring in computer science and engineering.”

Liu and Guo worked to recruit undergraduates majoring in computer science to help develop the systems for the text data mining of chemical publications and to calibrate the learning algorithms to recognize older PDF files. “The students have produced very nice results and have contributed to peer-reviewed manuscripts we’re preparing for submission about the data mining model,” said Liu.
The research team has also worked closely with the WSU Office of Commercialization to register their Language Model Extractor, or LMExt, as intellectual property owned by WSU.
“Once we’ve secured approval, we hope to publish the text data mining tool online for use by academia or nonprofit entities,” said Liu.
The geochemistry team in the Guo Lab is looking for additional collaborators to help test their geochemistry datasets and applications, and the Guo Lab website hosts links to both the critical element thermodynamic database and the critical element solubility database.
To learn more about this ongoing research effort, contact Juejing Liu and Xiaofeng Guo in the Department of Chemistry.