Biodiversity Research Center, Academia Sinica, Taipei, Taiwan; Institute for Comparative Genomics, American Museum of Natural History, New York, United States
Department of Pathobiological Sciences, University of Wisconsin-Madison, Madison, United States; Global Health Institute, University of Wisconsin-Madison, Madison, United States
Institute for Comparative Genomics, American Museum of Natural History, New York, United States; Department of Epidemiology and Biostatistics, School of Public Health, SUNY Downstate Health Sciences University, Brooklyn, United States; Institute for Genomic Health, SUNY Downstate Health Sciences University, Brooklyn, United States; Division of Infectious Diseases, Department of Medicine, SUNY Downstate Health Sciences University, Brooklyn, United States
Departments of Integrative Biology and Statistics, University of California, Berkeley, Berkeley, United States; Departments of Computer Science, Human Genetics, and Computational Medicine, University of California, Los Angeles, Los Angeles, United States
Understanding the emergence of novel viruses requires an accurate and comprehensive annotation of their genomes. Overlapping genes (OLGs) are common in viruses and have been associated with pandemics but are still widely overlooked. We identify and characterize ORF3d, a novel OLG in SARS-CoV-2 that is also present in Guangxi pangolin-CoVs but not other closely related pangolin-CoVs or bat-CoVs. We then document evidence of ORF3d translation, characterize its protein sequence, and conduct an evolutionary analysis at three levels: between taxa (21 members of Severe acute respiratory syndrome-related coronavirus), between human hosts (3978 SARS-CoV-2 consensus sequences), and within human hosts (401 deeply sequenced SARS-CoV-2 samples). ORF3d has been independently identified and shown to elicit a strong antibody response in COVID-19 patients. However, it has been misclassified as the unrelated gene ORF3b, leading to confusion. Our results liken ORF3d to other accessory genes in emerging viruses and highlight the importance of OLGs.