Data in Brief (Dec 2023)

HexAI-TJAtxt: A textual dataset to advance open scientific research in total joint arthroplasty

  • Soheyla Amirian,
  • Husam Ghazaleh,
  • Luke A. Carlson,
  • Matthew Gong,
  • Logan Finger,
  • Johannes F. Plate,
  • Ahmad P. Tafti

Journal volume & issue
Vol. 51
p. 109738

Abstract

Read online

Total joint arthroplasty (TJA) is the most common and fastest inpatient surgical procedure in the elderly, nationwide. Due to the increasing number of TJA patients and advancements in healthcare, there is a growing number of scientific articles being published in a daily basis. These articles offer important insights into TJA, covering aspects like diagnosis, prevention, treatment strategies, and epidemiological factors. However, there has been limited effort to compile a large-scale text dataset from these articles and make it publicly available for open scientific research in TJA. Rapid yet, utilizing computational text analysis on these large columns of scientific literatures holds great potential for uncovering new knowledge to enhance our understanding of joint diseases and improve the quality of TJA care and clinical outcomes. This work aims to build a dataset entitled HexAI-TJAtxt, which includes more than 61,936 scientific abstracts collected from PubMed using MeSH (Medical Subject Headings) terms within “MeSH Subheading” and “MeSH Major Topic,” and Publication Date from 01/01/2000 to 12/31/2022. The current dataset is freely and publicly available at https://github.com/pitthexai/HexAI-TJAtxt, and it will be updated frequently in bi-monthly manner from new abstracts published at PubMed.

Keywords