Scientific Data (Dec 2022)

A Brazilian classified data set for prognosis of tuberculosis, between January 2001 and April 2020

  • Maicon Herverton Lino Ferreira da Silva Barros,
  • Guto Leoni Santos,
  • Maria Gabriela de Almeida Rodrigues,
  • Vanderson Sampaio,
  • Theo Lynn,
  • Patricia Takako Endo

DOI
https://doi.org/10.1038/s41597-022-01892-4
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 8

Abstract

Read online

Abstract After COVID-19, tuberculosis (TB) is the leading cause of death by an infectious disease in the world. This work presents a data set based on data collected from the Brazilian Information System for Notifiable Diseases (SINAN) for the period from January 2001 to April 2020 relating to patients diagnosed with tuberculosis in Brazil. The data from SINAN was pre-processed to generate a new data set with two distinct treatment outcome classes: CURED and DIED. The data set comprises 37 categorical attributes (including socio-demographic, clinical, and laboratory data) as well as the target class. There are 927,909 records of patients classified as CURED and 36,190 classified as DIED, totaling 964,099 records.