Fuzzy Conservation-Based Algorithm for Protein Family Classification
Type of Culminating Activity
Master of Science in Engineering, Computer Engineering
Electrical and Computer Engineering
Scott F. Smith
The development of advanced computational techniques to classify protein sequences into evolutionarily relationships is an important problem in bioinformatics. Most protein classification methods rely upon patterns of residue conservation within sequences to identify evolutionary relationships; however, the sequences of related proteins can vary dramatically, while their structure remains conserved. These remotely homologous sequences are difficult to classify using traditional sequence-only methods, but using both residue and structure for classification may indicate a relationship.
This thesis presents a protein classification method that uses standard fuzzy logic methods to combine the residue, secondary structure, and solvent accessibility conservation patterns found in the multiple sequence alignments (MSA) of protein families. The combined conservation of each alignment position is used to weight a position-specific scoring matrix (PSSM) of the protein family.
Statistical randomization methods were used for reliability tests. The results were excellent, with 99.84% of the PSSMs able to differentiate between family and non-family members. Several potential remote homologs were identified and the conservation patterns for the three families that performed poorly may help researchers identify alternative classifications for the sequences in these families.
Hatcher, Valerie Storrs, "Fuzzy Conservation-Based Algorithm for Protein Family Classification" (2006). Boise State University Theses and Dissertations. 491.