PLoS Computational Biology (Sep 2019)
A scale-free analysis of the HIV-1 genome demonstrates multiple conserved regions of structural and functional importance.
Abstract
HIV-1 replicates via a low-fidelity polymerase with a high mutation rate; strong conservation of individual nucleotides is highly indicative of the presence of critical structural or functional properties. Identifying such conservation can reveal novel insights into viral behaviour. We analysed 3651 publicly available sequences for the presence of nucleic acid conservation beyond that required by amino acid constraints, using a novel scale-free method that identifies regions of outlying score together with a codon scoring algorithm. Sequences with outlying score were further analysed using an algorithm for producing local RNA folds whilst accounting for alignment properties. 11 different conserved regions were identified, some corresponding to well-known cis-acting functions of the HIV-1 genome but also others whose conservation has not previously been noted. We identify rational causes for many of these, including cis functions, possible additional reading frame usage, a plausible mechanism by which the central polypurine tract primes second-strand DNA synthesis and a conformational stabilising function of a region at the 5' end of env.