BMC Bioinformatics (Mar 2019)

Multispecies genome-wide analysis defines the MAP3K gene family in Gossypium hirsutum and reveals conserved family expansions

  • Norbert Bokros,
  • Sorina C. Popescu,
  • George V. Popescu

DOI
https://doi.org/10.1186/s12859-019-2624-9
Journal volume & issue
Vol. 20, no. S2
pp. 73 – 85

Abstract

Read online

Abstract Background Gene families are sets of structurally and evolutionarily related genes – in one or multiple species – that typically share a conserved biological function. As such, the identification and subsequent analyses of entire gene families are widely employed in the fields of evolutionary and functional genomics of both well established and newly sequenced plant genomes. Currently, plant gene families are typically identified using one of two major ways: 1) HMM-profile based searches using models built on Arabidopsis thaliana genes or 2) coding sequence homology searches using curated databases. Integrated databases containing functionally annotated genes and gene families have been developed for model organisms and several important crops; however, a comprehensive methodology for gene family annotation is currently lacking, preventing automated annotation of newly sequenced genomes. Results This paper proposes a combined measure of homology identification, motif conservation, phylogenomic and integrated gene expression analyses to define gene family structures in multiple plant species. The MAP3K gene families in seven plant species, including two currently unexamined species Gossypium hirsutum, and Zostera marina, were characterized to reveal new insights into their collective function and evolution and demonstrate the effectiveness of our novel methodology. Conclusion Compared with recent reports, this methodology performs significantly better for the identification and analysis of gene family members in several monocots/dicots, diploid as well as polyploid plant species.

Keywords