SAHARA-J (Jan 2019)

A weighted bootstrap approach to logistic regression modelling in identifying risk behaviours associated with sexual activity

  • Humphrey Brydon,
  • Rénette Blignaut,
  • Joachim Jacobs

DOI
https://doi.org/10.1080/17290376.2019.1636708
Journal volume & issue
Vol. 16, no. 1
pp. 62 – 69

Abstract

Read online

The latest population estimates released by Statistics South Africa indicate that 25.03% of all deaths in 2017 in South Africa were AIDS-related. Along with these results, it is also reported that 7.06% of the population were living with HIV, with the HIV-prevalence among youth (aged 15–24) at 4.64% for 2017 (STATSSA. (2018). Retrieved from Statistics South Africa: http://www.statssa.gov.za/publications/P0302/P03022017.pdf). The data used in the study contained information related to the risk-taking behaviours associated with the sexual activity of entering first-year students at the University of the Western Cape. In this study, a logistic regression modelling procedure was carried out on those students that were determined to be sexually active, therefore, in the modelling procedure significant risk behaviours of sexually active first-year students could be identified. Of the 14 variables included in the modelling procedure, six were found to be significantly associated with sexually active students. The significant variables included; the age and race of the student, whether the student had ever taken an HIV test, the importance of religion in influencing the sexual behaviour of the student, whether the student consumed alcohol and lastly whether the student smoked. This study further investigated the impact of introducing sample weighting, bootstrap sampling as well as variable selection methods into the logistic regression modelling procedure. It is shown that incorporating these techniques into the modelling procedure produces logistic regression models that are more accurate and have an increased predictive capability. The bootstrapping procedure is shown to produce logistic regression models that are more accurate than those produced without a bootstrap procedure. A comparison between 200, 500 and 1000 bootstrap samples is also incorporated into the modelling procedure with the models produced from 200 bootstrap samples shown to be just as accurate those produced from 500 or 1000 bootstrap samples. Of the five variable selection methods used, it is shown that the Newton–Raphson and Fisher methods are unreliable in producing logistic regression models. The forward, backward and stepwise variable selection methods are shown to produce very similar results.

Keywords