An SVM-based method for assessment of transcription factor-DNA complex models

Rosario I. Corona; Sanjana Sudarshan; Srinivas Aluru; Jun-tao Guo

doi:10.1186/s12859-018-2538-y

BMC Bioinformatics (Dec 2018)

An SVM-based method for assessment of transcription factor-DNA complex models

Rosario I. Corona,
Sanjana Sudarshan,
Srinivas Aluru,
Jun-tao Guo

Affiliations

Rosario I. Corona: Department of Bioinformatics and Genomics, University of North Carolina at Charlotte
Sanjana Sudarshan: Department of Bioinformatics and Genomics, University of North Carolina at Charlotte
Srinivas Aluru: School of Computational Science and Engineering, Georgia Institute of Technology
Jun-tao Guo: Department of Bioinformatics and Genomics, University of North Carolina at Charlotte

DOI: https://doi.org/10.1186/s12859-018-2538-y
Journal volume & issue: Vol. 19, no. S20
pp. 49 – 57

Abstract

Read online

Abstract Background Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. Results We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. Conclusions A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords