Imperfect DNA mirror repeats in the gag gene of HIV-1 (HXB2) identify key functional domains and coincide with protein structural elements in each of the mature proteins
School of Contemporary Sciences, University of Abertay-Dundee, Bell Street, Dundee DD1 1HG, Scotland, UK
Virology Journal 2007, 4:113 doi:10.1186/1743-422X-4-113Published: 26 October 2007
A DNA mirror repeat is a sequence segment delimited on the basis of its containing a center of symmetry on a single strand, e.g. 5'-GCATGGTACG-3'. It is most frequently described in association with a functionally significant site in a genomic sequence, and its occurrence is regarded as noteworthy, if not unusual. However, imperfect mirror repeats (IMRs) having ≥ 50% symmetry are common in the protein coding DNA of monomeric proteins and their distribution has been found to coincide with protein structural elements – helices, β sheets and turns. In this study, the distribution of IMRs is evaluated in a polyprotein – to determine whether IMRs may be related to the position or order of protein cleavage or other hierarchal aspects of protein function. The gag gene of HIV-1 [GenBank:K03455] was selected for the study because its protein motifs and structural components are well documented.
There is a highly specific relationship between IMRs and structural and functional aspects of the Gag polyprotein. The five longest IMRs in the polyprotein translate a key functional segment in each of the five cleavage products. Throughout the protein, IMRs coincide with functionally significant segments of the protein. A detailed annotation of the protein, which combines structural, functional and IMR data illustrates these associations. There is a significant statistical correlation between the ends of IMRs and the ends of PSEs in each of the mature proteins. Weakly symmetric IMRs (≥ 33%) are related to cleavage positions and processes.
The frequency and distribution of IMRs in HIV-1 Gag indicates that DNA symmetry is a fundamental property of protein coding DNA and that different levels of symmetry are associated with different functional aspects of the gene and its protein. The interaction between IMRs and protein structure and function is precise and interwoven over the entire length of the polyprotein. The distribution of IMRs and their relationship to structural and functional motifs in the protein that they translate, suggest that DNA-driven processes, including the selection of mirror repeats, may be a constraining factor in molecular evolution.