Covariance models are a powerful description of non-coding RNA (ncRNA) families that can be used to search nucleotide databases for new members of these ncRNA families. Currently, estimation of the parameters of a covariance model (state transition and emission scores) is based only on the observed frequencies of mutations, insertions, and deletions in known ncRNA sequences. For families with very few known members, this can result in rather uninformative models where the consensus sequence has a good score and most deviations from consensus have a fairly uniform poor score. It is proposed here to combine the traditional observed-frequency information with known information about free energy changes in RNA helix formation and loop length changes. More thermodynamically probable deviations from the consensus sequence will then be favored in database search. The thermodynamic information may be incorporated into the models as informative priors that depend on neighboring consensus nucleotides and on loop lengths.
This document was originally published by IEEE in the Bio-Inspired Models of Network, Information and Computing Systems, 2007. Bionetics 2007. Copyright restrictions may apply. DOI: 10.1109/BIMNICS.2007.4610108
Smith, Jennifer A. and Wiese, Kay C.. (2007). "Improved Covariance Model Parameter Estimation Using RNA Thermodynamic Properties". Bio-Inspired Models of Network, Information and Computing Systems, 2007. Bionetics 2007, 185-191. http://dx.doi.org/10.1109/BIMNICS.2007.4610108