MO3.R6.2

Coding Scheme for Noisy Nanopore Sequencing with Backtracking and Skipping Errors

Yeow meng Chee, National University of Singapore, Singapore; Kees A. Schouhamer Immink, Turing Machines Inc., Netherlands; Van Khu Vu, National University of Singapore, Singapore

Session:
Coding in Biology 2

Track:
17: Information and Coding in Biology

Location:
Sigma/Delta

Presentation Time:
Mon, 8 Jul, 14:55 - 15:15

Session Chair:
Emanuele Viterbo, Monash University
Abstract
In DNA-based data storage, sequencing the stored DNA is essential in reading the stored data. Nanopore sequencing, an emerging sequencing technology, has attracted a lot of attention recently owing to their various advantages, in particular, it is portable, scalable, automated and rapid. However, several kinds of errors, including inter-symbol interference, noisy measurement, backtracking, and skipping, reduce the accuracy of the technology. Several coding schemes have been proposed recently to deal with some kinds of errors, especially inter-symbol interference and noisy measurement. In this work, we focus on backtracking and skipping errors and aim to design a good coding scheme to combat these errors. We first note that backtracking and skipping errors can be modelled as some synchronization errors, including duplication and deletion errors. Next, we propose new families of codes to locate and correct all synchronization errors caused by backtracking and skipping. The proposed codes are constrained codes avoiding some certain set of patterns. Then, we focus on studying these constrained codes. In particular, we present a method to compute their maximal asymptotic rates. For illustration, we use experimental data available online to compute the numerical results for maximal asymptotic rates of these codes.
Resources