Evolutionary Bioinformatics (Feb 2020)

Major Revisions in Arthropod Phylogeny Through Improved Supermatrix, With Support for Two Possible Waves of Land Invasion by Chelicerates

  • Katherine E Noah,
  • Jiasheng Hao,
  • Luyan Li,
  • Xiaoyan Sun,
  • Brian Foley,
  • Qun Yang,
  • Xuhua Xia

DOI
https://doi.org/10.1177/1176934320903735
Journal volume & issue
Vol. 16

Abstract

Read online

Deep phylogeny involving arthropod lineages is difficult to recover because the erosion of phylogenetic signals over time leads to unreliable multiple sequence alignment (MSA) and subsequent phylogenetic reconstruction. One way to alleviate the problem is to assemble a large number of gene sequences to compensate for the weakness in each individual gene. Such an approach has led to many robustly supported but contradictory phylogenies. A close examination shows that the supermatrix approach often suffers from two shortcomings. The first is that MSA is rarely checked for reliability and, as will be illustrated, can be poor. The second is that, to alleviate the problem of homoplasy at the third codon position of protein-coding genes due to convergent evolution of nucleotide frequencies, phylogeneticists may remove or degenerate the third codon position but may do it improperly and introduce new biases. We performed extensive reanalysis of one of such “big data” sets to highlight these two problems, and demonstrated the power and benefits of correcting or alleviating these problems. Our results support a new group with Xiphosura and Arachnopulmonata (Tetrapulmonata + Scorpiones) as sister taxa. This favors a new hypothesis in which the ancestor of Xiphosura and the extinct Eurypterida (sea scorpions, of which many later forms lived in brackish or freshwater) returned to the sea after the initial chelicerate invasion of land. Our phylogeny is supported even with the original data but processed with a new “principled” codon degeneration. We also show that removing the 1673 codon sites with both AGN and UCN codons (encoding serine) in our alignment can partially reconcile discrepancies between nucleotide-based and AA-based tree, partly because two sequences, one with AGN and the other with UCN, would be identical at the amino acid level but quite different at the nucleotide level.