BMC Genomic Data (Jan 2023)
Uncovering novel MHC alleles from RNA-Seq data: expanding the spectrum of MHC class I alleles in sheep
Abstract
Abstract Background Major histocompatibility complex (MHC) class I glycoproteins present selected peptides or antigens to CD8 + T cells that control the cytotoxic immune response. The MHC class I genes are among the most polymorphic loci in the vertebrate genome, with more than twenty thousand alleles known in humans. In sheep, only a very small number of alleles have been described to date, making the development of genotyping systems or functional studies difficult. A cost-effective way to identify new alleles could be to use already available RNA-Seq data from sheep. Current strategies for aligning RNA-Seq reads against annotated genome sequences or transcriptomes fail to detect the majority of class I alleles. Here, I combine the alignment of RNA-Seq reads against a specific reference database with de novo assembly to identify alleles. The method allows the comprehensive discovery of novel MHC class I alleles from RNA-Seq data (DinoMfRS). Results Using DinoMfRS, virtually all expressed MHC class I alleles could be determined. From 18 animals 75 MHC class I alleles were identified, of which 69 were novel. In addition, it was shown that DinoMfRS can be used to improve the annotation of MHC genes in the sheep genome sequence. Conclusions DinoMfRS allows for the first time the annotation of unknown, more divergent MHC alleles from RNA-Seq data. Successful application to RNA-Seq data from 16 animals has approximately doubled the number of known alleles in sheep. By using existing data, alleles can now be determined very inexpensively for populations that have not been well studied. In addition, MHC expression studies or evolutionary studies, for example, can be greatly improved in this way, and the method should be applicable to a broader spectrum of other multigene families or highly polymorphic genes.
Keywords