Methods in Ecology and Evolution (Mar 2024)
How to get the most out of phylogenetic imputation without abusing it
Abstract
Abstract Phylogenies are viewed as potentially powerful resources to predict missing values in trait datasets, but they are often misused. Critically, many of the imputed values that completely or partially rely on phylogenetic information are trusted without convincingly demonstrating that the data meet the requirements for the predictions to be at least minimally valuable. I discuss that phylogenetic signal, which is the mainstay of phylogenetic imputation, is often interpreted as ‘strong’ because the outcome of randomization tests has prevailed over the actual strength of the signal in determining whether it is strong or not. This circumstance has led many researchers to infer conclusions based on ‘strong’ signals that are actually way more labile than a phylogenetic random walk (i.e. Brownian motion). Although trait evolutionary trajectories that nearly fit Brownian motion are typically considered as strongly conserved, the Brownian process is subject to high levels of stochasticity that may render spurious predictions under some circumstances. To my knowledge, very few studies (if any) that rely on phylogenetically imputed information have rigorously evaluated the expected accuracy of individual predictions, despite among‐lineage variability in prediction accuracy can be dramatic even for strongly conserved traits. Here, I advocate for a Monte‐Carlo approach that is based on trait simulations to assess the prediction accuracy that is expected for each missing value in the traits of interest, which can be continuous or discrete. The framework is presented in a detailed step‐by‐step R tutorial that was conceived for non‐specialized researchers to identify highly likely spurious predictions without the need for advanced technical and statistical skills. Although phylogenetic imputation has important limitations, I suggest that leveraging advances in our understanding of such hindrances and using the technique with caution and restraint will allow trait‐based research to progress further while sampling efforts continue replacing imputed data.
Keywords