Additional Funding Sources

This material is based upon work partially supported by the National Science Foundation under Grant Nos. CHE-1506417 (co-funded by CDS&E) and CHE-1904166 (co-funded by CDS&E and the Office of Investigative and Forensic Sciences in the National Institute of Justice) and is gratefully acknowledged by the authors.

Abstract

Multivariate calibration is about modeling the relationship between a substance's chemical profile and its spectrum (here, near-infrared) in order to predict the concentration of new samples with known spectra. However, these new samples are often measured under different conditions than the primary conditions; different instruments, instrument drift, and temperature all affect the measurement conditions. Domain adaptation (DA) methods force the model to ignore these differences in order to generate an accurate model for the new domain (secondary conditions). There are two fundamental DA processes that individual methods can be classified under. One augments a few samples from the secondary domain with chemical reference values (labels) to the primary data and the other augments only secondary spectra (unlabeled data). In this work, we compare two existing labeled DA methods and two existing unlabeled DA methods to two novel labeled methods and a novel unlabeled approach. Since DA methods require selection of hyperparameters, a model selection framework based on model diversity and prediction similarity (MDPS) is applied to the DA methods. Regardless of the DA method, the MDPS process is shown to select models more accurate than the first quartile of all models generated by the DA process in three near-infrared datasets.

Share

COinS
 

Multivariate Calibration Domain Adaptation with Unlabeled Data

Multivariate calibration is about modeling the relationship between a substance's chemical profile and its spectrum (here, near-infrared) in order to predict the concentration of new samples with known spectra. However, these new samples are often measured under different conditions than the primary conditions; different instruments, instrument drift, and temperature all affect the measurement conditions. Domain adaptation (DA) methods force the model to ignore these differences in order to generate an accurate model for the new domain (secondary conditions). There are two fundamental DA processes that individual methods can be classified under. One augments a few samples from the secondary domain with chemical reference values (labels) to the primary data and the other augments only secondary spectra (unlabeled data). In this work, we compare two existing labeled DA methods and two existing unlabeled DA methods to two novel labeled methods and a novel unlabeled approach. Since DA methods require selection of hyperparameters, a model selection framework based on model diversity and prediction similarity (MDPS) is applied to the DA methods. Regardless of the DA method, the MDPS process is shown to select models more accurate than the first quartile of all models generated by the DA process in three near-infrared datasets.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.