Volvox_scan: A Clustering Algorithm for Predicting Related Mutants in a Sequenced Population
Additional Funding Sources
This project is supported by a 2021-2022 STEM Undergraduate Research Grant from the Higher Education Research Council.
Abstract
In forward genetics studies, the accurate detection of bona fide induced DNA mutations can be negatively impacted by the presence contaminants introduced by DNA library and sample preparation errors, DNA sequencing and alignment errors, sample mislabeling and pollen contamination (in plants). These challenges impact the accuracy of variant-calling algorithms for predicting DNA mutations in next-generation sequencing (NGS) datasets, leading to false-positive detections. For large-scale mutant population studies utilizing independently mutagenized individuals, the filtering of common (or shared) variants is a potent solution to mitigating false positives. Although filtering of common variants is a widely used technique, it can result in the unintentional removal of false negatives if the sequenced mutant population includes mutants that are genetically related. Hence determining which mutants are genetically related would be beneficial for downstream variant-call filtering. We implemented an efficient mutation clustering algorithm (volvox_scan) for detecting subpopulations of mutants in a sequenced population that are likely genetically related. We demonstrate the efficiency of the volvox_scan algorithm in uncovering clusters of likely related mutants from datasets of several large-scale mutant population studies.
Volvox_scan: A Clustering Algorithm for Predicting Related Mutants in a Sequenced Population
In forward genetics studies, the accurate detection of bona fide induced DNA mutations can be negatively impacted by the presence contaminants introduced by DNA library and sample preparation errors, DNA sequencing and alignment errors, sample mislabeling and pollen contamination (in plants). These challenges impact the accuracy of variant-calling algorithms for predicting DNA mutations in next-generation sequencing (NGS) datasets, leading to false-positive detections. For large-scale mutant population studies utilizing independently mutagenized individuals, the filtering of common (or shared) variants is a potent solution to mitigating false positives. Although filtering of common variants is a widely used technique, it can result in the unintentional removal of false negatives if the sequenced mutant population includes mutants that are genetically related. Hence determining which mutants are genetically related would be beneficial for downstream variant-call filtering. We implemented an efficient mutation clustering algorithm (volvox_scan) for detecting subpopulations of mutants in a sequenced population that are likely genetically related. We demonstrate the efficiency of the volvox_scan algorithm in uncovering clusters of likely related mutants from datasets of several large-scale mutant population studies.