Известия высших учебных заведений: Прикладная нелинейная динамика (Jul 2024)

Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics

  • Zimnyakov, Dmitry Александрович,
  • Alonova, Marina Vasilevna,
  • Skripal, Anatolij Vladimirovich,
  • Inkin, Maksim Glebovich,
  • Zaytsev, Sergey S,
  • Feodorova, Valentina

DOI
https://doi.org/10.18500/0869-6632-003110
Journal volume & issue
Vol. 32, no. 4
pp. 439 – 459

Abstract

Read online

Purpose of this work is the comparative analysis of two approaches to the synthesis of two-dimensional binary identifiers of nucleotide sequences obtained using DNA sequencing of biological objects. Methods. One of the approaches is based on modeling the polarization-dependent diffraction of a coherent readout beam on a two-dimensional phase-modulating structure (phase screen) associated with the symbolic sequence obtained as a result of DNA sequencing. Another approach uses a two-dimensional representation of the symbolic sequence using a chaos game representation (CGR). To obtain a finite-element CGR mapping, it is fragmented into a given number of cells, ensuring acceptable sensitivity of the synthesized binary identifier to structural changes in the displayed sequence. Results. The comparative analysis was carried out using fragments of symbol sequences corresponding to various strains (Wuhan, Delta, Omicron) of the SarSCoV2 virus. In the course of the analysis, the correlation coefficients between the binary identifiers corresponding to various strains were obtained and compared with each other. Conclusion. It has been established that binary identifiers synthesized using the polarization encoding technique are characterized by significantly higher sensitivity to structural changes in the analyzed sequences and smaller sizes compared to CGR binary identifiers.

Keywords