Publication Date

5-2013

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department Filter

Computer Science

Department

Computer Science

Supervisory Committee Chair

Timothy Andersen, Ph.D.

Abstract

Bioinformatics is a broad realm of research in which Computer Science has much to offer. Collecting, sorting, and analyzing statistical information for DNA and protein sequences is difficult due to the sheer amount of available data. Tools have been created to do this, but they have generally been limited by speed or robustness.

In addition to analyzing the statistical properties of biological sequences, it is also important to model and understand their chemical and physical properties. A number of valuable software tools are available for modeling and predicting the properties of biological sequences in Computational Chemistry, including molecular docking, and homology modeling. These tools are typically command-line driven, have a steep learning curve, and generally must be used in conjunction with other programs to extract the desired information. The average chemist or biologist is not well-versed in Computer Science principles nor command-line tools and involved scripting, and therefore finds it difficult to realize the full potential of tools that are available to them.

The work completed here assists the field of Bioinformatics with two software packages for biological sequence data analysis. One is CseqStat, which processes and gathers statistical data from large repositories of genome sequence data. The algorithms I have developed in CseqStat help speed up processing large amounts of sequencing data for nucleotides and proteins in the NCBI database. This application is robust in its abilities, finding the frequency of all sequences of a given length, in a reasonable time frame, determined by the length and the amount of input data, storing the results in an efficient manner, and providing a mechanism for quick and easy retrieval of such data.

The second utility is DockoMatic, a multi-faceted software package offering the ability to run high-throughput experiments, structure creation, and molecular dock- ing. I have collaborated with members of the Department of Chemistry and Biochemistry at Boise State University to gain valuable insight to improve existing methods.

DockoMatic was developed to allow a user to invoke and manage large numbers of molecular binding calculations, linear and cyclic peptide analog structure creation, and molecular modeling experiments on a single computer or cluster. Specifically, DockoMatic was created to

• automate peptide-based ligand creation based on single-letter residue codes,

• automate AutoDock job creation, submission, and management for high-throughput docking experiments,

• automate competitive binding experiments,

• track multiple jobs in real-time on a cluster,

• organize results in a useful manner,

• provide an intuitive GUI,

• automate cyclic peptide analog creation, and

• provide a simple interface for creating molecular models.

Recommended Citation

Bullock, William Casey, "Novel Algorithms and Software for Biological Sequence Analysis" (2013). Boise State University Theses and Dissertations. 374.
https://scholarworks.boisestate.edu/td/374

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks

Boise State University Theses and Dissertations

Novel Algorithms and Software for Biological Sequence Analysis

Publication Date

Type of Culminating Activity

Degree Title

Department Filter

Department

Supervisory Committee Chair

Abstract

Recommended Citation

Included in

Browse

Links

Search

Author Corner

Links

ScholarWorks

Boise State University Theses and Dissertations

Novel Algorithms and Software for Biological Sequence Analysis

Author

Publication Date

Type of Culminating Activity

Degree Title

Department Filter

Department

Supervisory Committee Chair

Abstract

Recommended Citation

Included in

Share

Browse

Links

Search

Author Corner

Links