Additional Funding Sources
This material is based upon work partially supported by the National Science Foundation under Grant No. CHE-1904166 (co-funded by CDS&E and the Office of Investigative and Forensic Sciences in the National Institute of Justice).
Presentation Date
7-2021
Abstract
There are many scenarios in which it is necessary or desirable to quantitatively analyze a substance for its constituents: carbon content of soil, sugar content in orange juice, protein or fat percent in meat products, and more. In all of these situations, samples must be sent “off to the lab” for expensive analysis from experts using advanced equipment. However, methods using cheap and near-instantaneous measurement techniques (near-infrared spectroscopy) combined with machine learning allow for on-demand chemical analysis without expensive and time-consuming laboratory methods. At present, these techniques, termed “multivariate calibration”, are impractical due to complexities arising from the impossibility of standardizing sample and measurement conditions in many data situations. For example, models trained using soil samples from across the entire US are ineffective at predicting soil samples only from Eastern Idaho. Presented is a novel process termed Local Adaptive Fusion Regression (LAFR) which identifies a subset of the training library that is highly similar to each target sample in order to improve multivariate calibration predictions. Results are presented based on the datasets described above (soil carbon content, etc.) and show impressive improvements over traditional multivariate calibration processes without increased cost to the user. Furthermore, results indicate that LAFR successfully mines through complicated datasets with almost 100,000 samples to identify only 30 samples that are highly matched to the target sample in all sample and measurement conditions. These substantial improvements to multivariate calibration may prove that we are moving away from a world of sending samples “off to the lab”, and instead will be able to quickly and inexpensively analyze substances using only a smartphone and cloud computing.
Stop Sending Samples "Off to the Lab" for Analysis: A Machine Learning Solution
There are many scenarios in which it is necessary or desirable to quantitatively analyze a substance for its constituents: carbon content of soil, sugar content in orange juice, protein or fat percent in meat products, and more. In all of these situations, samples must be sent “off to the lab” for expensive analysis from experts using advanced equipment. However, methods using cheap and near-instantaneous measurement techniques (near-infrared spectroscopy) combined with machine learning allow for on-demand chemical analysis without expensive and time-consuming laboratory methods. At present, these techniques, termed “multivariate calibration”, are impractical due to complexities arising from the impossibility of standardizing sample and measurement conditions in many data situations. For example, models trained using soil samples from across the entire US are ineffective at predicting soil samples only from Eastern Idaho. Presented is a novel process termed Local Adaptive Fusion Regression (LAFR) which identifies a subset of the training library that is highly similar to each target sample in order to improve multivariate calibration predictions. Results are presented based on the datasets described above (soil carbon content, etc.) and show impressive improvements over traditional multivariate calibration processes without increased cost to the user. Furthermore, results indicate that LAFR successfully mines through complicated datasets with almost 100,000 samples to identify only 30 samples that are highly matched to the target sample in all sample and measurement conditions. These substantial improvements to multivariate calibration may prove that we are moving away from a world of sending samples “off to the lab”, and instead will be able to quickly and inexpensively analyze substances using only a smartphone and cloud computing.