BMC Medical Research Methodology (Oct 2023)

Functional data analysis to characterize disease patterns in frequent longitudinal data: application to bacterial vaginal microbiota patterns using weekly Nugent scores and identification of pattern-specific risk factors

  • Rahul Biswas,
  • Marie Thoma,
  • Xiangrong Kong

DOI
https://doi.org/10.1186/s12874-023-02063-8
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Technology advancement has allowed more frequent monitoring of biomarkers. The resulting data structure entails more frequent follow-ups compared to traditional longitudinal studies where the number of follow-up is often small. Such data allow explorations of the role of intra-person variability in understanding disease etiology and characterizing disease processes. A specific example was to characterize pathogenesis of bacterial vaginosis (BV) using weekly vaginal microbiota Nugent assay scores collected over 2 years in post-menarcheeal women from Rakai, Uganda, and to identify risk factors for each vaginal microbiota pattern to inform epidemiological and etiological understanding of the pathogenesis of BV. Methods We use a fully data-driven approach to characterize the longitudinal patters of vaginal microbiota by considering the densely sampled Nugent scores to be random functions over time and performing dimension reduction by functional principal components. Extending a current functional data clustering method, we use a hierarchical functional clustering framework considering multiple data features to help identify clinically meaningful patterns of vaginal microbiota fluctuations. Additionally, multinomial logistic regression was used to identify risk factors for each vaginal microbiota pattern to inform epidemiological and etiological understanding of the pathogenesis of BV. Results Using weekly Nugent scores over 2 years of 211 sexually active and post-menarcheal women in Rakai, four patterns of vaginal microbiota variation were identified: persistent with a BV state (high Nugent scores), persistent with normal ranged Nugent scores, large fluctuation of Nugent scores which however are predominantly in the BV state; large fluctuation of Nugent scores but predominantly the scores are in the normal state. Higher Nugent score at the start of an interval, younger age group of less than 20 years, unprotected source for bathing water, a woman’s partner’s being not circumcised, use of injectable/Norplant hormonal contraceptives for family planning were associated with higher odds of persistent BV in women. Conclusion The hierarchical functional data clustering method can be used for fully data driven unsupervised clustering of densely sampled longitudinal data to identify clinically informative clusters and risk-factors associated with each cluster.

Keywords