Virology Journal (Aug 2011)
A computational approach to identify point mutations associated with occult hepatitis B: significant mutations affect coding regions but not regulative elements of HBV
Abstract
Abstract Background Occult Hepatitis B Infection (OBI) is characterized by absence of serum HBsAg and persistence of HBV-DNA in liver tissue, with low to undetectable serum HBV-DNA. The mechanisms underlying OBI remain to be clarified. To evaluate if specific point mutations of HBV genome may be associated with OBI, we applied an approach based on bioinformatics analysis of complete genome HBV sequences. In addition, the feasibility of bioinformatics prediction models to classify HBV infections into OBI and non-OBI by molecular data was evaluated. Methods 41 OBI and 162 non-OBI complete genome sequences were retrieved from GenBank, aligned and subjected to univariable analysis including statistical evaluation. Their S coding region was analyzed for Stop codon mutations too, while S amino acid variability could be evaluated for genotype D only, due to the too small number of available complete genome OBI sequences from other genotypes. Prediction models were derived by multivariable analysis using Logistic Regression, Rule Induction and Random Forest approaches, with extra-sample error estimation by Multiple ten-fold Cross-Validation (MCV). Models were compared by t-test on the Area Under the Receiver Operating Characteristic curve (AUC) distributions obtained from the MCV runs for each model against the best-performing model. Results Variations in seven nucleotide positions were significantly associated with OBI, and occurred in 11 out of 41 OBI sequences (26.8%): likely, other mutations did not reach statistical significance due to the small size of OBI dataset. All variations affected at least one HBV coding region, but none of them mapped to regulative elements. All viral proteins, with the only exception of the X, were affected. Stop codons in the S, that might account for absence of serum HBsAg, were not significantly enriched in OBI sequences. In genotype D, amino acid variability in the S was higher in OBI than non-OBI, particularly in the immunodominant region. A Random Forest prediction model showed the best performance, but all models were not satisfactory in terms of specificity, due to the small sample size of OBIs; however results are promising in the perspective of a broader dataset of complete genome OBI sequences. Conclusions Data suggest that point mutations rarely occur in regulative elements of HBV, if ever, and contribute to OBI by affecting different viral proteins, suggesting heterogeneous mechanisms may be responsible for OBI, including, at least in genotype D, an escape mutation mechanism due to imperfect immune control. It appears possible to derive prediction models based on molecular data when a larger set of complete genome OBI sequences will become available.
Keywords