Publication Date

12-2018

Date of Final Oral Examination (Defense)

9-7-2018

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

Tim Andersen, Ph.D.

Supervisory Committee Member

Reza Zadegan, Ph.D.

Supervisory Committee Member

Catherine Olschanowsky, Ph.D.

Abstract

The recent explosion of digital data has created an increasing need for improved data storage architectures with the ability to store large amounts of data over extensive periods of time. DNA as a data storage solution shows promise with a thousand times greater increase in information density and information retention times ranging from hundreds to thousands of years. This thesis explores the challenges and potential approaches in developing an encoding and decoding algorithm for use in a DNA data storage architecture. When encoding binary data into sequences representing DNA strands, the algorithms should account for biological constraints representing the idiosyncrasies of working with a molecular substance. We present REDNAM (Robust Encoding and Decoding of Nucleic Acid Memory). REDNAM includes a novel mapping scheme and translation stage which converts hexadecimal data to codons while accounting for four constraints; removing start codons, avoiding repeating nucleotides, excluding longer repeating sequences, and maintaining close to 50% GC content. We have integrated this mapping scheme into the Fountain Codes algorithm in an implementation that balances information density with error correction and parity data. Preliminary results show that our implementation can successfully recover the original dataset after artificial insertion, deletion and mutation errors have randomly perturbed the encoded information. We also achieved a speed up of two times for encoding and 435 times for decoding compared to another Fountain Codes implementation.

DOI

10.18122/td/1500/boisestate

Share

COinS