The Role of The Prefix Array in Sequence Analysis: A Survey

Frantisek Franek; W. F. Smyth; Xinfang Wang

doi:10.3934/medsci.2017.3.261

AIMS Medical Science (Jul 2017)

The Role of The Prefix Array in Sequence Analysis: A Survey

Frantisek Franek,
W. F. Smyth,
Xinfang Wang

Affiliations

Frantisek Franek
W. F. Smyth
Xinfang Wang: School of Computational Science and Engineering, McMaster University, Hamilton, Canada

DOI: https://doi.org/10.3934/medsci.2017.3.261
Journal volume & issue: Vol. 4, no. 3
pp. 261 – 273

Abstract

Read online

The prefix array was apparently first computed and used algorithmically in 1984, playing a pivotal role in an optimal algorithm to determine all the tandem repeats in a given (DNA or protein) sequence. However, it is especially since the turn of the 21st century that applications of the prefix array to fundamental sequencing problems have been recognized. An important aspect of this expanding role has been the recognition that the prefix table and the border array are “equivalent” data structures 一 that is, one can be computed from the other in linear time. Since the border array in turn specifies all the periods of every prefix of the sequence, the prefix array thus turns out to be a structure of central importance. In this paper we survey important applications of the prefix array 一 in particular to approximate string matching under Hamming distance, as well as to the computation of covers and enhanced covers 一 and show how, unlike border array algorithms, these are extendible to sequences containing “don’t-care” or indeterminate letters such as {a, c} or {g, t}. This extension leads to a surprising correspondence between prefix arrays and undirected graphs that seems likely to be a fertile source of new insights in future. We conclude with an overview of sequencing problems that the authors believe can be handled using prefix array technology.

Published in AIMS Medical Science

ISSN: 2375-1576 (Online)
Publisher: AIMS Press
Country of publisher: United States
LCC subjects: Medicine: Medicine (General)
Website: http://www.aimspress.com/journal/medicalScience

About the journal

Abstract

Keywords