Data in Brief (Dec 2023)
HexAI-TJAtxt: A textual dataset to advance open scientific research in total joint arthroplasty
Abstract
Total joint arthroplasty (TJA) is the most common and fastest inpatient surgical procedure in the elderly, nationwide. Due to the increasing number of TJA patients and advancements in healthcare, there is a growing number of scientific articles being published in a daily basis. These articles offer important insights into TJA, covering aspects like diagnosis, prevention, treatment strategies, and epidemiological factors. However, there has been limited effort to compile a large-scale text dataset from these articles and make it publicly available for open scientific research in TJA. Rapid yet, utilizing computational text analysis on these large columns of scientific literatures holds great potential for uncovering new knowledge to enhance our understanding of joint diseases and improve the quality of TJA care and clinical outcomes. This work aims to build a dataset entitled HexAI-TJAtxt, which includes more than 61,936 scientific abstracts collected from PubMed using MeSH (Medical Subject Headings) terms within “MeSH Subheading” and “MeSH Major Topic,” and Publication Date from 01/01/2000 to 12/31/2022. The current dataset is freely and publicly available at https://github.com/pitthexai/HexAI-TJAtxt, and it will be updated frequently in bi-monthly manner from new abstracts published at PubMed.