Spatial constrains and information content of sub-genomic regions of the human genome

Leonidas P. Karakatsanis; Evgenios G. Pavlos; George Tsoulouhas; Georgios L. Stamokostas; Timothy Mosbruger; Jamie L. Duke; George P. Pavlos; Dimitri S. Monos

iScience (Feb 2021)

Spatial constrains and information content of sub-genomic regions of the human genome

Leonidas P. Karakatsanis,
Evgenios G. Pavlos,
George Tsoulouhas,
Georgios L. Stamokostas,
Timothy Mosbruger,
Jamie L. Duke,
George P. Pavlos,
Dimitri S. Monos

Affiliations

Leonidas P. Karakatsanis: Department of Environmental Engineering, Complexity Research Team (CRT), Democritus University of Thrace, 67100 Xanthi, Greece; Corresponding author
Evgenios G. Pavlos: Department of Environmental Engineering, Complexity Research Team (CRT), Democritus University of Thrace, 67100 Xanthi, Greece; Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Crete 71003, Greece
George Tsoulouhas: Department of Environmental Engineering, Complexity Research Team (CRT), Democritus University of Thrace, 67100 Xanthi, Greece
Georgios L. Stamokostas: Department of Environmental Engineering, Complexity Research Team (CRT), Democritus University of Thrace, 67100 Xanthi, Greece
Timothy Mosbruger: Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Jamie L. Duke: Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
George P. Pavlos: Department of Environmental Engineering, Complexity Research Team (CRT), Democritus University of Thrace, 67100 Xanthi, Greece
Dimitri S. Monos: Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Corresponding author

Journal volume & issue: Vol. 24, no. 2
p. 102048

Abstract

Read online

Summary: Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distribution of these sequences. For this we developed an integrated methodology that is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques, and a technical index, integrating the generated information, which we introduce and named complexity factor (COFA). Our analysis revealed that the size distribution of the genomic regions within chromosomes are not random but follow patterns with characteristic features that have been seen through its complexity character, and it is part of the dynamics of the whole genome. Finally, this picture of dynamics in DNA is recognized using ML tools for clustering, classification, and prediction with high accuracy.

Published in iScience

ISSN: 2589-0042 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science
Website: http://www.cell.com/iscience/home

About the journal

Abstract

Keywords