Publication Date

8-2022

Date of Final Oral Examination (Defense)

4-1-2022

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

Tim Andersen, PhD

Supervisory Committee Member

Edoardo Serra, PhD

Supervisory Committee Member

William L. Hughes, PhD

Supervisory Committee Member

Reza Zadegan, PhD

Abstract

Our increasingly information driven world is growing the demand for new storage technologies. Current estimates place the total storage demands exceeding the supply of usable silicon by 2040 [1]. DNA is an attractive technology due to its incredible density, almost negligible energy requirements, and data retention measured in centuries [1]. DNA does, however, come with new challenges. It is an organic compound with complex internal interactions which complicate the design and synthesis of DNA sequences for the purpose of data storage. In this work we demonstrate a new encoding-decoding process that accounts for some of the challenges in encoding and decoding, including issues arising from the secondary structure of the sequence, repeated nucleotides, unwanted subsequences, as well as GC content, vital for ensuring stable sequences. This is accomplished by using a graph representation of the possible encoding space that captures the relevant constraints, combined with a search algorithm that identifies the optimal encoding for the given input data accounting for these constraints. A benefit of our approach is that by leveraging the constraints on the encoding process, the decoding algorithm is able to correct single point errors without the aid of error correction codes; this is something no current competing solution can accomplish.

DOI

https://doi.org/10.18122/td.2008.boisestate

Share

COinS