Ecology and Evolution (Aug 2022)
How low can you go? Introducing SeXY: sex identification from low‐quantity sequencing data despite lacking assembled sex chromosomes
Abstract
Abstract Accurate sex identification is crucial for elucidating the biology of a species. In the absence of directly observable sexual characteristics, sex identification of wild fauna can be challenging, if not impossible. Molecular sexing offers a powerful alternative to morphological sexing approaches. Here, we present SeXY, a novel sex‐identification pipeline, for very low‐coverage shotgun sequencing data from a single individual. SeXY was designed to utilize low‐effort screening data for sex identification and does not require a conspecific sex‐chromosome assembly as reference. We assess the accuracy of our pipeline to data quantity by downsampling sequencing data from 100,000 to 1000 mapped reads and to reference genome selection by mapping to a variety of reference genomes of various qualities and phylogenetic distance. We show that our method is 100% accurate when mapping to a high‐quality (highly contiguous N50 > 30 Mb) conspecific genome, even down to 1000 mapped reads. For lower‐quality reference assemblies (N50 < 30 Mb), our method is 100% accurate with 50,000 mapped reads, regardless of reference assembly quality or phylogenetic distance. The SeXY pipeline provides several advantages over previously implemented methods; SeXY (i) requires sequencing data from only a single individual, (ii) does not require assembled conspecific sex chromosomes, or even a conspecific reference assembly, (iii) takes into account variation in coverage across the genome, and (iv) is accurate with only 1000 mapped reads in many cases.
Keywords