Frontiers in Cellular and Infection Microbiology (May 2024)

Optimization of genetic distance threshold for inferring the CRF01_AE molecular network based on next-generation sequencing

  • Lijuan Hu,
  • Lijuan Hu,
  • Lijuan Hu,
  • Lijuan Hu,
  • Lijuan Hu,
  • Bin Zhao,
  • Bin Zhao,
  • Bin Zhao,
  • Bin Zhao,
  • Bin Zhao,
  • Mingchen Liu,
  • Mingchen Liu,
  • Mingchen Liu,
  • Mingchen Liu,
  • Yang Gao,
  • Yang Gao,
  • Yang Gao,
  • Yang Gao,
  • Haibo Ding,
  • Haibo Ding,
  • Haibo Ding,
  • Haibo Ding,
  • Haibo Ding,
  • Qinghai Hu,
  • Qinghai Hu,
  • Qinghai Hu,
  • Qinghai Hu,
  • Qinghai Hu,
  • Minghui An,
  • Minghui An,
  • Minghui An,
  • Minghui An,
  • Minghui An,
  • Hong Shang,
  • Hong Shang,
  • Hong Shang,
  • Hong Shang,
  • Hong Shang,
  • Xiaoxu Han,
  • Xiaoxu Han,
  • Xiaoxu Han,
  • Xiaoxu Han,
  • Xiaoxu Han

DOI
https://doi.org/10.3389/fcimb.2024.1388059
Journal volume & issue
Vol. 14

Abstract

Read online

IntroductionHIV molecular network based on genetic distance (GD) has been extensively utilized. However, the GD threshold for the non-B subtype differs from that of subtype B. This study aimed to optimize the GD threshold for inferring the CRF01_AE molecular network.MethodsNext-generation sequencing data of partial CRF01_AE pol sequences were obtained for 59 samples from 12 transmission pairs enrolled from a high-risk cohort during 2009 and 2014. The paired GD was calculated using the Tamura-Nei 93 model to infer a GD threshold range for HIV molecular networks.Results2,019 CRF01_AE pol sequences and information on recent HIV infection (RHI) from newly diagnosed individuals in Shenyang from 2016 to 2019 were collected to construct molecular networks to assess the ability of the inferred GD thresholds to predict recent transmission events. When HIV transmission occurs within a span of 1-4 years, the mean paired GD between the sequences of the donor and recipient within the same transmission pair were as follow: 0.008, 0.011, 0.013, and 0.023 substitutions/site. Using these four GD thresholds, it was found that 98.9%, 96.0%, 88.2%, and 40.4% of all randomly paired GD values from 12 transmission pairs were correctly identified as originating from the same transmission pairs. In the real world, as the GD threshold increased from 0.001 to 0.02 substitutions/site, the proportion of RHI within the molecular network gradually increased from 16.6% to 92.3%. Meanwhile, the proportion of links with RHI gradually decreased from 87.0% to 48.2%. The two curves intersected at a GD of 0.008 substitutions/site.DiscussionA suitable range of GD thresholds, 0.008-0.013 substitutions/site, was identified to infer the CRF01_AE molecular transmission network and identify HIV transmission events that occurred within the past three years. This finding provides valuable data for selecting an appropriate GD thresholds in constructing molecular networks for non-B subtypes.

Keywords