MO4.R6.3

A Combinatorial Perspective on Random Access Efficiency for DNA Storage

Anina Gruica, Eindhoven University of Technology, Netherlands; Daniella Bar-Lev, Technion- Israel Institute of Technology, Israel; Alberto Ravagnani, Eindhoven University of Technology, Netherlands; Eitan Yaakobi, Technion - Israel Institute of Technology, Israel

Session:
DNA storage and coding

Track:
17: Information and Coding in Biology

Location:
Sigma/Delta

Presentation Time:
Mon, 8 Jul, 17:05 - 17:25

Session Chair:
Eitan Yaakobi, Technion -- Israel Institute of Technology
Abstract
We investigate the fundamental limits of the recently proposed \emph{random access coverage depth problem} for DNA data storage. Under this paradigm, it is assumed that the user information consists of $k$ information strands, which are encoded into $n$ strands via some generator matrix $G$. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large number of copies. In this context, the random access coverage depth problem refers to the expected number of reads (i.e., sequenced strands) until it is possible to decode a specific information strand, which is requested by the user. The goal is to minimize the maximum expectation over all possible requested information strands, and this value is denoted by $\tmax(G)$. This paper introduces new techniques to investigate the random access coverage depth problem, which capture its combinatorial nature. We establish two general formulas to find $\tmax(G)$ for arbitrary matrices. We introduce the concept of \emph{recovery balanced codes} and combine all these results and notions to compute $\tmax(G)$ for MDS, simplex, and Hamming codes. We also study the performance of modified systematic MDS matrices and our results show that the best results for $\tmax(G)$ are achieved with a specific mix of encoded strands and replication of the information strands.
Resources