IEEE ISIT 2024 || Athens, Greece || 7-12 July 2024

WE2.R6.2

Capacity of Runlength-Limited and GC-Content Constrained Codes for DNA Data Storage

Filip Paluncic, Daniella Paluncic, B. T. Maharaj, University of Pretoria, South Africa

Session:

Information Theory in Biology

Track:

17: Information and Coding in Biology

Location:

Sigma/Delta

Presentation Time:

Wed, 10 Jul, 11:50 - 12:10

Session Chair:

Mary Wootters, Stanford University

Abstract

Numerous code constructions which limit the maximum homopolymer run length and constrain GC-content have been proposed in literature for DNA data storage. These codes help to reduce insertion, deletion and substitution errors in DNA strands, since strands with large homopolymer runs and a large unbalance in GC-content, i.e., GC-content is too high or too low, are more prone to the previously mentioned errors during synthesis and sequencing. While the capacity of non-binary runlength-limited codes is known, the capacity of non-binary balanced runlength-limited codes is unknown. Here, we generalize the capacity result of binary balanced runlength-limited codes to the non-binary alphabet and demonstrate that the capacity of non-binary balanced (equal number of occurrences of each non-binary symbol) runlength-limited codes is equal to the capacity of the corresponding runlength-limited codes without the balancing constraint. An immediate consequence of the above result when applied to the quaternary alphabet of DNA data storage is that the capacity of runlength-limited and GC-content constrained codes for any arbitrary range of permitted GC-content unbalance which includes the balanced case is equal to the capacity of the corresponding runlength-limited codes without the GC-content constraint, i.e., the imposition of any GC-content constraint to runlength-limited codes does not induce a capacity penalty.

Session WE2.R6

WE2.R6.1: Mutual Information of a class of Poisson-type Channels using Markov Renewal Theory

Maximilian Gehri, Nicolai Engelmann, Heinz Koeppl, Technische Universität Darmstadt, Germany

WE2.R6.2: Capacity of Runlength-Limited and GC-Content Constrained Codes for DNA Data Storage

Filip Paluncic, Daniella Paluncic, B. T. Maharaj, University of Pretoria, South Africa

WE2.R6.3: Representing Information on DNA using Patterns Induced by Enzymatic Labeling

Daniella Bar-Lev, Tuvi Etzion, Technion – Israel Institute of Technology, Haifa, Israel, Israel; Eitan Yaakobi, Technion – Israel Institute of Technology, Israel; Zohar Yakhini, Reichman University, Israel

WE2.R6.4: Optimizing the Decoding Probability and Coverage Ratio of Composite DNA

Tomer Cohen, Eitan Yaakobi, Technion, Israel

Resources

View Manuscript