EBioMedicine (Jan 2023)
Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmesResearch in context
- Justin T. Reese,
- Hannah Blau,
- Elena Casiraghi,
- Timothy Bergquist,
- Johanna J. Loomba,
- Tiffany J. Callahan,
- Bryan Laraway,
- Corneliu Antonescu,
- Ben Coleman,
- Michael Gargano,
- Kenneth J. Wilkins,
- Luca Cappelletti,
- Tommaso Fontana,
- Nariman Ammar,
- Blessy Antony,
- T.M. Murali,
- J. Harry Caufield,
- Guy Karlebach,
- Julie A. McMurry,
- Andrew Williams,
- Richard Moffitt,
- Jineta Banerjee,
- Anthony E. Solomonides,
- Hannah Davis,
- Kristin Kostka,
- Giorgio Valentini,
- David Sahner,
- Christopher G. Chute,
- Charisse Madlock-Brown,
- Melissa A. Haendel,
- Peter N. Robinson,
- Heidi Spratt,
- Shyam Visweswaran,
- Joseph Eugene Flack, IV,
- Yun Jae Yoo,
- Davera Gabriel,
- G. Caleb Alexander,
- Hemalkumar B. Mehta,
- Feifan Liu,
- Robert T. Miller,
- Rachel Wong,
- Elaine L. Hill,
- Lorna E. Thorpe,
- Jasmin Divers
Affiliations
- Justin T. Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Hannah Blau
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
- Elena Casiraghi
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
- Timothy Bergquist
- Sage Bionetworks, Seattle, WA, USA
- Johanna J. Loomba
- The Integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA, USA
- Tiffany J. Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
- Bryan Laraway
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Corneliu Antonescu
- University of Arizona - Banner Health, Phoenix, AZ, USA
- Ben Coleman
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
- Michael Gargano
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
- Kenneth J. Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
- Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
- Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
- Nariman Ammar
- Health Science Center, University of Tennessee, Memphis, TN, USA
- Blessy Antony
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
- T.M. Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
- J. Harry Caufield
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
- Julie A. McMurry
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Andrew Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA; Tufts University School of Medicine, Institute for Clinical Research and Health Policy Studies, Boston, MA, USA; Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
- Richard Moffitt
- Department of Biomedical Informatics and Stony Brook Cancer Center, Stony Brook University, Stony Brook, NY, USA
- Jineta Banerjee
- Sage Bionetworks, Seattle, WA, USA
- Anthony E. Solomonides
- HealthSystem Research Institute, NorthShore University, Evanston, IL, USA
- Hannah Davis
- Patient-Led Research Collaborative, NY, USA
- Kristin Kostka
- Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
- Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
- David Sahner
- Axle Informatics, Rockville, MD, USA
- Christopher G. Chute
- Schools of Medicine, Public Health and Nursing, Johns Hopkins University, Baltimore, MD, USA
- Charisse Madlock-Brown
- Health Science Center, University of Tennessee, Memphis, TN, USA
- Melissa A. Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA; Corresponding author.
- Heidi Spratt
- Shyam Visweswaran
- Joseph Eugene Flack, IV
- Yun Jae Yoo
- Davera Gabriel
- G. Caleb Alexander
- Hemalkumar B. Mehta
- Feifan Liu
- Robert T. Miller
- Rachel Wong
- Elaine L. Hill
- Lorna E. Thorpe
- Jasmin Divers
- Journal volume & issue
-
Vol. 87
p. 104413
Abstract
Summary: Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.