Non-Coding RNA Covariance Model Combination Using Mixed Primary-Secondary Structure Alignment

Document Type

Conference Proceeding

Publication Date




Covariance models are very effective for finding new members of non-coding RNA sequence families in genomic data. However, the computation burden of applying CM-based search algorithms can be prohibitive. When annotating the genome of a newly sequenced organism it is usually desired to search the sequence data using a large number of ncRNA families. Computational burden can be reduced if the families are clustered into statistically similar models and a single cluster-average representative model produced. The database is then searched with the representative model for each cluster at a relatively low detection threshold. The output of this pre-filtered database is then processed with the individual family members of the cluster. A base-pair conflict metric has previously been proposed for use in model clustering. In this work an alternative metric using standard alignment algorithms and a special mixed primary-secondary structure scoring matrix is proposed.


This title is within the series Lecture Notes in Computer Science.