Date of Final Oral Examination (Defense)
Type of Culminating Activity
Master of Science in Computer Science
Tim Andersen, Ph.D.
Reza Zadegan, Ph.D.
Catherine Olschanowsky, Ph.D.
The recent explosion of digital data has created an increasing need for improved data storage architectures with the ability to store large amounts of data over extensive periods of time. DNA as a data storage solution shows promise with a thousand times greater increase in information density and information retention times ranging from hundreds to thousands of years. This thesis explores the challenges and potential approaches in developing an encoding and decoding algorithm for use in a DNA data storage architecture. When encoding binary data into sequences representing DNA strands, the algorithms should account for biological constraints representing the idiosyncrasies of working with a molecular substance. We present REDNAM (Robust Encoding and Decoding of Nucleic Acid Memory). REDNAM includes a novel mapping scheme and translation stage which converts hexadecimal data to codons while accounting for four constraints; removing start codons, avoiding repeating nucleotides, excluding longer repeating sequences, and maintaining close to 50% GC content. We have integrated this mapping scheme into the Fountain Codes algorithm in an implementation that balances information density with error correction and parity data. Preliminary results show that our implementation can successfully recover the original dataset after artificial insertion, deletion and mutation errors have randomly perturbed the encoded information. We also achieved a speed up of two times for encoding and 435 times for decoding compared to another Fountain Codes implementation.
Suyehira, Kelsey, "Using DNA For Data Storage: Encoding and Decoding Algorithm Development" (2018). Boise State University Theses and Dissertations. 1500.