Nature Communications (Nov 2022)
Systematic tissue annotations of genomics samples by modeling unstructured metadata
Abstract
The 1+ million publicly-available human –omics samples currently remain acutely underused. Here the authors present an approach combining natural language processing and machine learning to infer the source tissue of public genomics samples based on their plain text descriptions, making these samples easy to discover and reuse.