Journal of Medical Internet Research (May 2023)

Text Analysis of Trends in Health Equity and Disparities From the Internal Revenue Service Tax Documentation Submitted by US Nonprofit Hospitals Between 2010 and 2019: Exploratory Study

  • Emily Hadley,
  • Laura Haak Marcial,
  • Wes Quattrone,
  • Georgiy Bobashev

DOI
https://doi.org/10.2196/44330
Journal volume & issue
Vol. 25
p. e44330

Abstract

Read online

BackgroundMany US hospitals are classified as nonprofits and receive tax-exempt status partially in exchange for providing benefits to the community. Proof of compliance is collected with the Schedule H form submitted as part of the annual Internal Revenue Service Form 990 (F990H), including a free-response text section that is known for being ambiguous and difficult to audit. This research is among the first to use natural language processing approaches to evaluate this text section with a focus on health equity and disparities. ObjectiveThis study aims to determine the extent to which the free-response text in F990H reveals how nonprofit hospitals address health equity and disparities, including alignment with public priorities. MethodsWe used free-response text submitted by hospital reporting entities in Part V and VI of the Internal Revenue Service Form 990 Schedule H between 2010 and 2019. We identified 29 main themes connected to health equity and disparities, and 152 related key phrases. We tallied occurrences of these phrases through term frequency analysis, calculated the Moran I statistic to assess geographic variation in 2018, analyzed Google Trends use for the same terms during the same period, and used semantic search with Sentence-BERT in Python to understand contextual use. ResultsWe found increased use from 2010 to 2019 across all the 29 phrase themes related to health equity and disparities. More than 90% of hospital reporting entities used terms in 2018 and 2019 related to affordability (2018: 2117/2131, 99.34%; 2019: 1620/1627, 99.57%), government organizations (2018: 2053/2131, 96.33%; 2019: 1577/1627, 96.93%), mental health (2018: 1937/2131, 90.9%; 2019: 1517/1627, 93.24%), and data collection (2018: 1947/2131, 91.37%; 2019: 1502/1627, 92.32%). The themes with the largest relative increase were LGBTQ (lesbian, gay, bisexual, transgender, and queer; 1676%; 2010: 12/2328, 0.51%; 2019: 149/1627, 9.16%) and social determinants of health (958%; 2010: 68/2328, 2.92%; 2019: 503/1627, 30.92%). Terms related to homelessness varied geographically from 2010 to 2018, and terms related to equity, health IT, immigration, LGBTQ, oral health, rural, social determinants of health, and substance use showed statistically significant (P<.05) geographic variation in 2018. The largest percentage point increase was for terms related to substance use (2010: 403/2328, 17.31%; 2019: 1149/1627, 70.62%). However, use in themes such as LGBTQ, disability, oral health, and race and ethnicity ranked lower than public interest in these topics, and some increased mentions of themes were to explicitly say that no action was taken. ConclusionsHospital reporting entities demonstrate an increasing awareness of health equity and disparities in community benefit tax documentation, but these do not necessarily correspond with general population interests or additional action. We propose further investigation of alignment with community health needs assessments and make suggestions for improvements to F990H reporting requirements.