Современные информационные технологии и IT-образование (May 2020)

The Method of Estimating the Period of a Symbolic Periodic Sequence with Noise, Based on the Sub-Words Positions in the Sequence

  • Galina Zhukova,
  • Alexey Zhukov,
  • Yuri Smetanin,
  • Mikhail Ulyanov

DOI
https://doi.org/10.25559/SITITO.16.202001.23-32
Journal volume & issue
Vol. 16, no. 1
pp. 23 – 32

Abstract

Read online

The method of estimating the period of an almost periodic sequence is proposed. The period of an almost periodic sequence is the length of the smallest periodic pattern that composes the corresponding completely periodic sequence by repetition. The method can be applied to corrupted periodic sequences that originated from a periodic sequence that consists of no less than eight complete periodic patterns. In the corresponding periodic sequence with noise of insertion, deletion or change, some periodic patterns might be corrupted by noise intrusion. The level of noise is supposed to be less than 10%, this assumption allows us to use a shift operator with the window of width 16 so that we can observe at least more than twice every uncorrupted pattern of length 16 that the sequence under consideration contains. The method is based on the counting of the number of symbols in the word w between the two first symbols of the closest equal patterns of length 16. Only patterns that were encountered in word more than twice are used for counting the differences between the left positions of the neighbor equal patterns in the sequence under consideration. All the differences counted are arranged in the ascending order and the 25% quantile and the median of the sequence of the differences are calculated. The computational experiment shows that in most cases the 25% quantile is an appropriate estimation of the length of the periodic pattern when the noise level is no more than 5%. Sometimes the method provides a sufficient result in case of noise level between 5 and 10%. The dependence of the percentage of sufficient estimations of periodic pattern length on the noise level was studied in cases of only one type of noise and the noise of all three types in equal proportions. The computational experiment showed that in all situations 25% quantile provides more sufficient estima-tions than the median. The method is supposed to be improved, in order to restore the whole corresponding periodic sequence using only the sequence with noise.

Keywords