Additional Funding Sources
This project was made possible by the NSF Idaho EPSCoR Program and by the National Science Foundation under Award No. OIA-1757324.
Abstract
Big sagebrush, Artemisia tridentata, is a keystone species in the western United States, as it covers a wide geographic range and more than 350 species depend on it. Currently, three ecologically distinct subspecies are recognized: A. t. wyomingensis, A. t. tridentata, and A. t. vaseyana, but when these subspecies co-occur, they can be expected to hybridize. Volatile organic compounds (VOCs) have been found to separate the subspecies well, but the efficacy of chemical subspecies identification in a hybrid zone has not been tested thus far.
Here, we employed a random forest algorithm on a dataset, collected from the common gardens and classified in 2016, to select the chemical compounds with the highest discriminative power to delineate big sagebrush subspecies. We then utilized a machine-learning algorithm (MLA), trained with the 2016 dataset, to classify samples from a hybrid zone which were collected in 2020 based on the presence and abundance of delineating chemical compounds. The results were visualized using multi-dimensional scaling. The MLA was successfully able to classify the subspecies of the 2016 dataset, as well as A. t. wyomingensis of the 2020 dataset. However, classification of A. t. vaseyana and tridentata did not match our expectations, which might be due to hybrid origin of the individuals, or due to the absence of two main delineating compounds in the 2020 samples. Our results show that machine learning with VOCs is a promising approach to classify samples from a hybrid zone. To refine our analysis, reconducting a random forest analysis would provide additional compounds for retraining the MLA and classify the subspecies accurately.
Can a Computer “Smell” Big Sagebrush Subspecies?
Big sagebrush, Artemisia tridentata, is a keystone species in the western United States, as it covers a wide geographic range and more than 350 species depend on it. Currently, three ecologically distinct subspecies are recognized: A. t. wyomingensis, A. t. tridentata, and A. t. vaseyana, but when these subspecies co-occur, they can be expected to hybridize. Volatile organic compounds (VOCs) have been found to separate the subspecies well, but the efficacy of chemical subspecies identification in a hybrid zone has not been tested thus far.
Here, we employed a random forest algorithm on a dataset, collected from the common gardens and classified in 2016, to select the chemical compounds with the highest discriminative power to delineate big sagebrush subspecies. We then utilized a machine-learning algorithm (MLA), trained with the 2016 dataset, to classify samples from a hybrid zone which were collected in 2020 based on the presence and abundance of delineating chemical compounds. The results were visualized using multi-dimensional scaling. The MLA was successfully able to classify the subspecies of the 2016 dataset, as well as A. t. wyomingensis of the 2020 dataset. However, classification of A. t. vaseyana and tridentata did not match our expectations, which might be due to hybrid origin of the individuals, or due to the absence of two main delineating compounds in the 2020 samples. Our results show that machine learning with VOCs is a promising approach to classify samples from a hybrid zone. To refine our analysis, reconducting a random forest analysis would provide additional compounds for retraining the MLA and classify the subspecies accurately.