<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1743-422X-3-30</ui>
   <ji>1743-422X</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Genetic diversity among five T4-like bacteriophages</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Nolan</snm>
               <mi>M</mi>
               <fnm>James</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jnolan@uno.edu</email>
            </au>
            <au id="A2">
               <snm>Petrov</snm>
               <fnm>Vasiliy</fnm>
               <insr iid="I2"/>
               <email>vpetrov@tulane.edu</email>
            </au>
            <au id="A3">
               <snm>Bertrand</snm>
               <fnm>Claire</fnm>
               <insr iid="I3"/>
               <email>claire.bertrand@free.fr</email>
            </au>
            <au id="A4">
               <snm>Krisch</snm>
               <mi>M</mi>
               <fnm>Henry</fnm>
               <insr iid="I3"/>
               <email>krisch@ibcg.biotoul.fr</email>
            </au>
            <au id="A5">
               <snm>Karam</snm>
               <mi>D</mi>
               <fnm>Jim</fnm>
               <insr iid="I2"/>
               <email>karamoff@tulane.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biological Sciences, University of New Orleans, 2000 Lakeshore Dr., New Orleans, LA 70148, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biochemistry, Tulane University Health Sciences Center, 1430 Tulane Ave., New Orleans, LA 70112, USA</p>
            </ins>
            <ins id="I3">
               <p>LMGM-CNRS UMR 5100,118, route de Narbonne, 31062 Toulouse cedex 09, France</p>
            </ins>
         </insg>
         <source>Virology Journal</source>
         <issn>1743-422X</issn>
         <pubdate>2006</pubdate>
         <volume>3</volume>
         <issue>1</issue>
         <fpage>30</fpage>
         <url>http://www.virologyj.com/content/3/1/30</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16716236</pubid>
               <pubid idtype="doi">10.1186/1743-422X-3-30</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>31</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>23</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>23</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Nolan et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Bacteriophages are an important repository of genetic diversity. As one of the major constituents of terrestrial biomass, they exert profound effects on the earth's ecology and microbial evolution by mediating horizontal gene transfer between bacteria and controlling their growth. Only limited genomic sequence data are currently available for phages but even this reveals an overwhelming diversity in their gene sequences and genomes. The contribution of the T4-like phages to this overall phage diversity is difficult to assess, since only a few examples of complete genome sequence exist for these phages. Our analysis of five T4-like genomes represents half of the known T4-like genomes in GenBank.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here, we have examined in detail the genetic diversity of the genomes of five relatives of bacteriophage T4: the <it>Escherichia coli </it>phages RB43, RB49 and RB69, the <it>Aeromonas salmonicida </it>phage 44RR2.8t (or 44RR) and the <it>Aeromonas hydrophila </it>phage Aeh1. Our data define a core set of conserved genes common to these genomes as well as hundreds of additional open reading frames (ORFs) that are nonconserved. Although some of these ORFs resemble known genes from bacterial hosts or other phages, most show no significant similarity to any known sequence in the databases. The five genomes analyzed here all have similarities in gene regulation to T4. Sequence motifs resembling T4 early and late consensus promoters were observed in all five genomes. In contrast, only two of these genomes, RB69 and 44RR, showed similarities to T4 middle-mode promoter sequences and to the T4 <it>motA </it>gene product required for their recognition. In addition, we observed that each phage differed in the number and assortment of putative genes encoding host-like metabolic enzymes, tRNA species, and homing endonucleases.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our observations suggest that evolution of the T4-like phages has drawn on a highly diverged pool of genes in the microbial world. The T4-like phages harbour a wealth of genetic material that has not been identified previously. The mechanisms by which these genes may have arisen may differ from those previously proposed for the evolution of other bacteriophage genomes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The T4-like phages are a diverse group of lytic bacterial myoviruses that share genetic homologies and morphological similarities with the well-studied coliphage T4 <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. These phages provide an attractive model for the study of comparative genomics and phage evolution for several reasons: They possess relatively large dsDNA genomes that vary widely in size (~160&#8211;250 kb) and genetic composition. They contain host-like functions, such as nucleotide metabolism and a DNA replisome (reviewed in <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>). They experience different evolutionary constraints due to their lytic life cycle than do either their bacterial host or lysogenic bacteriophages. They exist under less stringent genomic size constraints than, for example, the lambdoid phages <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. T4 has a terminally redundant genome <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> that replicates by a recombination-primed replication pathway. The efficient and promiscuous T4-encoded recombination machinery <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> may generate a high degree of evolutionary diversity, via both homologous and non-homologous recombination between this phage genome and that of bacterial hosts or other phages. Thus the characteristics of the T4-like genome, its mechanism of replication, and the interactions with cellular hosts suggest that the T4-like phages constitute a natural crucible for the acquisition, evolution and dispersal of genetic information in the microbial world.</p>
         <p>We present here a bioinformatics analysis of the genome sequences of five T4-like bacteriophages. These phages include three coliphages (RB69, RB49 and RB43), and two <it>Aeromonas </it>phages (44RR2.8t and Aeh1). Our results complement and extend those previously reported from the coliphage T4 <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, the <it>Vibrio </it>phage, KVP40 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, and from the marine cyanophages S-PM2 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, P-SSM2 and P-SSM4 <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Our data identify a conserved core of T4-like genes found in all of these genomes, including some conserved ORFs of unknown function. One of the most striking findings is the presence of large numbers of novel open reading frames (ORFs), most of which have no significant match in GenBank. Both conserved and nonconserved regions of the genomes include sequence motifs resembling T4 promoters. Thus, it appears that both core and novel genes are co-ordinately expressed in a manner similar to that of T4. We compare the possible origins of the novel regions of the T4 genome with those proposed for other phages.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Genome overview</p>
            </st>
            <p>We have analyzed five complete genome sequences of phylogenetically distant T4-like bacteriophages. This analysis is the first part of an ongoing comparative genomics project on T4-like phages. At present this project has generated single contiguous sequences for 12 divergent T4-like genomes. Of these sequences, five genomes were selected for in depth analysis on the basis of their phylogenetically diversity <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Among completed genomes that are not dealt with here are the <it>Aeromonas </it>phages 31 and 25, since they are both close relatives of 44RR2.8t and thus do not add significantly to the sequence diversity of the group. Five other genomes are considered draft quality (coliphages RB16 and phi-1, <it>Vibrio </it>phage nt-1, <it>Acinetobacter </it>phage 133, and <it>Aeromonas </it>phage 65) and are not included in this analysis but are available through the Tulane T4-like Genome Website <url>http://phage.bioc.tulane.edu</url>. The five genomes presented here share between 61 and 67 percent amino acid similarity to each other among ~100 conserved open reading frames. T4 is most closely related to RB69, with which it shares 81% amino acid similarity over 207 ORFs. T4 exhibits about the same level of similarity to the other 4 genomes as they do to each other.</p>
            <p>A summary of this analysis is presented in Table <tblr tid="T1">1</tblr>. The sizes of these five genomes range between 164 kb and 233 kb. The genome of Aeh1 had been predicted to be significantly larger than the other genomes, based on pulse field gel electrophoresis of genomic DNA <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. This genome (233234 bp) is in fact nearly 40% larger than the average of T4 and the other four genomes presented here; the genomes of KVP40 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and P-SSM2 <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> are larger still (244 kb and 252 kb, respectively). All genomes have low %GC, although to a lesser degree than T4. ORFs were identified using GeneMarkS <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> and ORFs orthologous to T4 genes were identified by blastp mutual best hits to predicted proteins in the GenBank accession for the T4 genome. The probable significance of matches was assessed by expected value (E-value) scores. Most ORFs scored well below the 10<sup>-4 </sup>cutoff for significant matches. A conserved core of 82 ORFs (T4-like genes) was found in all 5 genomes analysed here. There are 106 T4-like genes conserved among at least 4 of these 5 genomes; Aeh1 shared the fewest of these conserved genes (94) and the average similarity of the T4 orthologs of the conserved genes was lowest in this phage as well (49%). The conserved genes are generally clustered in several large blocks throughout each genome. Interspersed between these conserved blocks are segments containing blocks of predicted novel ORFs, most of which are unique to the genome that harbours them. Novel ORFs represent between 20% and 54% of the total coding capacity of the 5 genomes analyzed.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Summary of T4-like genome sequences determined in comparison with T4</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Genome</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Size (%GC)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># ORFS (% of genome)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># tRNAs</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># T4-like ORFs (% of all)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>#novel ORFs</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>T4</p>
                     </c>
                     <c ca="center">
                        <p>168,904 (35.0%)</p>
                     </c>
                     <c ca="center">
                        <p>273 (95.9%)</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>209 (76.6%)</p>
                     </c>
                     <c ca="center">
                        <p>64</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB69</p>
                     </c>
                     <c ca="center">
                        <p>167,560 (37.6%)</p>
                     </c>
                     <c ca="center">
                        <p>273 (94.0%)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>208 (77.7%)</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49</p>
                     </c>
                     <c ca="center">
                        <p>164,018 (40.5%)</p>
                     </c>
                     <c ca="center">
                        <p>272 (94.5%)</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>121 (44.5%)</p>
                     </c>
                     <c ca="center">
                        <p>151</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1</p>
                     </c>
                     <c ca="center">
                        <p>233,234 (42.8%)</p>
                     </c>
                     <c ca="center">
                        <p>332 (91.6%)</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>104 (31.3%)</p>
                     </c>
                     <c ca="center">
                        <p>228</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43</p>
                     </c>
                     <c ca="center">
                        <p>180,500 (43.2%)</p>
                     </c>
                     <c ca="center">
                        <p>292 (94.2%)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>114 (39.0%)</p>
                     </c>
                     <c ca="center">
                        <p>178</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RR 2.8t</p>
                     </c>
                     <c ca="center">
                        <p>173591 (44.0%)</p>
                     </c>
                     <c ca="center">
                        <p>253 (92.8%)</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>116 (45.8%)</p>
                     </c>
                     <c ca="center">
                        <p>137</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The number of ORFs for T4 is from the GenBank accession but does not include 7 alternative translation products included within some ORFs. The number of ORFs predicted for T4 by GeneMarkS was 266 (93.1% of the genome length). tRNAs were predicted by tRNAscan-SE. The number of T4-like ORFs is the number of ORFs conserved in T4 and at least one of the other genomes studied. The remainder of ORFs in each genome are novel ORFs.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Conserved genes and ORFs</p>
            </st>
            <p>The conserved genes are generally localized in large clusters. The gene order among the clusters is highly collinear between most phages, as depicted in Figure <figr fid="F1">1</figr>: a higher resolution version is also available (see <supplr sid="S1">additional file 1</supplr>). In T4, early and middle expressed genes are transcribed in a leftward direction (counterclockwise on the circular map), while late genes are primarily transcribed in the opposite direction. The genomes of RB69, RB49, and 44RR display a high degree of synteny with T4 and maintain essentially all of the clustering of related genes seen in T4. Synteny with T4 conserves the gene orientation with respect to time of expression during the infectious cycle. The genome of Aeh1 is also syntenous with T4, although small rearrangements of individual genes can be seen in Figure <figr fid="F1">1</figr>. Only RB43, with at least two substantial genome rearrangements, displays a significant break in synteny with T4 and the other T4-like phage genomes. The predicted transcription pattern appears more complex for RB43, with smaller clusters of genes predicted to be co-transcribed and some orthologs of T4 early and middle genes are transcribed from the opposite strand used in T4 <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. A discussion of genes conserved in all T4-like phages can be found in a companion manuscript <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, as well as an earlier work <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Blast alignment of T4-like genomes</p>
               </caption>
               <text>
                  <p><b>Blast alignment of T4-like genomes</b>. Conserved T4-like genes are displayed as blue arrows, novel ORFs are shown as red arrows, tRNAs as black arrowheads. Pairwise tblastx similarities between genomes are indicated by green boxes. Similarities separated by less than 90 bp were combined for visual clarity. Yellow regions indicate similarities found in inverted orientation between genomes.</p>
               </text>
               <graphic file="1743-422X-3-30-1"/>
            </fig>
            <p>The T4 genome has 132 predicted ORFs of unknown function. Eleven of these ORFs are conserved among the five T4-like genomes and orthologs to 93 T4 ORFs are found in at least one of these genomes. Although the conserved ORFs were not identified as essential in T4 by genetic methods <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, their preservation among phages suggests that they must be advantageous for survival in nature. In most instances the functions provided by these conserved ORFs remains obscure, but matches to Pfam motifs provide some clues about the function for a few of these ORFs, as shown in Table <tblr tid="T2">2</tblr>. For example, ORF <it>vs.6 </it>has a highly significant match to the Gly_radical Pfam accession, which is also found in the <it>nrdD </it>anaerobic nucleotide reductase. Thus, the <it>vs.6 </it>gene product may play a role in phage-induced nucleotide metabolism. Another conserved ORF, <it>vs.1</it>, exhibits marginally significant similarity to the SLT lytic transglycosylase domain, suggesting some role in cell lysis. These results corroborate PSI-BLAST matches previously reported for the T4 <it>vs.1 </it>and <it>vs.6 </it>ORFs to lysozyme and glycyl radical domains <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Overall, the match of <it>vs.1 </it>to the SLT domain is conserved; four of the six phage <it>vs.1 </it>orthologs match SLT with E value &lt;0.05 and the other two orthologs match more marginally, with E&lt; 0.75. The <it>nrdC.10 </it>ORF is conserved in 3 of 6 phages, and all 3 of these match the AAA ATPase motif, with E values ranging from 0.082 to 0.16. Another conserved ORF, <it>5.4</it>, displays a less probable, although conserved, match to the PAAR membrane associated motif. However, such low probability matches must be interpreted with caution, but they could provide starting points for the identification of the functions for conserved proteins. Functional assignments for <it>vs.1</it>, <it>vs.6</it>, and <it>nrdC.10 </it>were corroborated by BLAST matches to the Conserved Domain database <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. In addition, Conserved Domain BLAST searches identified matches for 4 of 6 <it>tk.4 </it>orthologs to the A1pp phosphatase domain and 5 of 6 <it>nrdC.11 </it>orthologs to the COG3541 nucleotidyltransferase domain.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Domain matches for T4 conserved ORFs</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Gene</p>
                     </c>
                     <c ca="left">
                        <p>Pfam domain name</p>
                     </c>
                     <c ca="left">
                        <p>E value range</p>
                     </c>
                     <c ca="left">
                        <p>genomes hit</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>vs.6</p>
                     </c>
                     <c ca="left">
                        <p>Gly_radical formyl transferase</p>
                     </c>
                     <c ca="left">
                        <p>1.40E-45 to 8.8E-15</p>
                     </c>
                     <c ca="left">
                        <p>6/6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>vs.1</p>
                     </c>
                     <c ca="left">
                        <p>SLT Transglycosylase</p>
                     </c>
                     <c ca="left">
                        <p>0.012 to 0.74</p>
                     </c>
                     <c ca="left">
                        <p>6/6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>nrdC.10</p>
                     </c>
                     <c ca="left">
                        <p>AAA ATPase family</p>
                     </c>
                     <c ca="left">
                        <p>0.082 to 0.16</p>
                     </c>
                     <c ca="left">
                        <p>3/3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>nrdC.10</p>
                     </c>
                     <c ca="left">
                        <p>BSD domain</p>
                     </c>
                     <c ca="left">
                        <p>0.076</p>
                     </c>
                     <c ca="left">
                        <p>1/3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>nrdC.2</p>
                     </c>
                     <c ca="left">
                        <p>TFIIS_C</p>
                     </c>
                     <c ca="left">
                        <p>0.021</p>
                     </c>
                     <c ca="left">
                        <p>1/6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*nrdC.11</p>
                     </c>
                     <c ca="left">
                        <p>COG3541: nucleotidyl transferase</p>
                     </c>
                     <c ca="left">
                        <p>4.0E-07 to 0.013</p>
                     </c>
                     <c ca="left">
                        <p>2/6 full alignment 4/6 partial alignment</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*tk.4</p>
                     </c>
                     <c ca="left">
                        <p>smart00506:A1pp phosphatase</p>
                     </c>
                     <c ca="left">
                        <p>2.0E-20 to 0.04</p>
                     </c>
                     <c ca="left">
                        <p>4/6 full alignment 1/6 partial alignment</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Matches are HMMer matches to the Pfam database. * indicates BLAST matches to CCD database. Genomes hit shows (number of orthologs matching Pfam domain)/(total number of orthologs identified for the five genomes studied plus T4). For CDD matches, alignment to the full domain or partial length alignment is noted. Additional conserved ORFs for which no function was identified are: <it>uvsW.1</it>, <it>pseT.2</it>, <it>pseT.3</it>, <it>a-gt.4</it>, and <it>61.1</it>.</p>
               </tblfn>
            </tbl>
            <p>Only recently has the conserved ORF <it>uvsW.1 </it>been recognized <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> in T4. Previously this sequence was believed to encode the C-terminal 76 amino acids of the UvsW protein. For all 5 of the genomes analyzed here, the coding region corresponding to T4 <it>uvsW </it>was divided into 2 ORFs, <it>uvsW </it>and <it>uvsW.1</it>. Concurrent crystallography on the UvsW protein from T4, showed that it too lacked the region similar to <it>uvsW.1 </it>and subsequent resequencing of this region in T4 confirmed the presence of the two distinct ORFs, <it>uvsW.1 </it>and <it>uvsW </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Although <it>uvsW.1 </it>is conserved among T4 and all 5 genomes studied here, its function remains unknown.</p>
         </sec>
         <sec>
            <st>
               <p>Novel ORFS</p>
            </st>
            <p>Each phage genome includes a surprisingly large number of ORFs that have no matches in T4. We term these ORFs "novel ORFs" and their numbers range from 230 in Aeh1 (54% of the genome) to 62 (20% of the genome) in RB69. Similarly, 64 T4 ORFs (15% of the genome) have no apparent ortholog in RB69, its closest relative in this analysis; these 64 ORFs are novel to T4 (see Table <tblr tid="T1">1</tblr>). Locations of the novel ORFs appear to be non-random, with most clustered in groups between blocks of conserved genes. In a few instances, however novel ORFs are found singly between conserved genes (see Figure <figr fid="F1">1</figr>). The direction of transcription of the novel ORFs is almost invariably the same as flanking conserved genes. This suggests that the novel ORFs are subject to the same regulatory constraints as the rest of the phage genome, with early expressed genes being transcribed primarily counterclockwise and late genes being transcribed clockwise. Nearly 90% of the novel ORFs are clustered among early and middle gene orthologs, suggesting that these genes are expressed at the beginning of the infectious cycle, along with the flanking conserved genes (see also below). The novel ORFs do not appear to differ significantly in codon bias from conserved genes. They share the same strand bias of the third codon position seen in T4 <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and do not vary significantly in codon adaptation index <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> from conserved genes (data not shown). These observations argue that the novel ORFs are not recent acquisitions of host genes.</p>
            <p>We searched the sequences of novel ORFs for matches to phage genomes and the Swissprot database by using blastp, and Pfam motifs (HMMer). We identified a total of 750 ORFs from the 5 genomes that lacked T4 orthologs. Of these, only 64 showed matches to Pfam functional domains (Table <tblr tid="T3">3</tblr>) or to proteins of known function in GenBank. Although novel ORFs are not orthologs of T4-like genes, some appear to be paralogous duplications of adjacent, conserved genes, such as <it>RB69ORF010c </it>with <it>motB</it>, and <it>RB49ORF183c</it>, <it>44RRORF188c </it>and T4 ORFs <it>alt.-1 alt.-2</it>, with <it>alt</it>. An additional ORF, <it>44RRORF187c</it>, appears to be a full-length duplication of <it>alt</it>, but displays only 54% similarity to 44RR <it>alt</it>. Although none of the remaining novel ORFs showed any similarity to T4, 89 of them matched other novel ORFs from one of the other five T4-like genomes in this study. A subset of ORFs in phages 44RR, Aeh1, and RB43 appear to be orthologs of a pyrimidine salvage pathway, previously described in the T4-like phage KVP40 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. This pathway includes an NAPRTase and a bifunctional NUDIX hydrolase/nucleotidyl transferase, which is distinct from the monofunctional NUDIX hydrolase, <it>nudE</it>, found in T4 <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>; <it>nudE </it>orthologs were also predicted for Aeh1, RB43 and RB69. It thus appears that Aeh1 and RB43 possess both the bifunctional NUDIX protein and the T4-like monofunctional NudE protein. It is unclear whether these observations reflect a functional redundancy for RB43 and Aeh1, or if <it>nudE </it>and the bifunctional NUDIX/transferase provide different functions in the phage-infected cell. Conversely, RB49 does not appear to encode either <it>nudE </it>or the bifunctional NUDIX protein.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Pfam hits for novel ORFs</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>ORF</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Pfam Domain name</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>E value</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF008c</p>
                     </c>
                     <c ca="left">
                        <p>Serine hydroxymethyltransferase</p>
                     </c>
                     <c ca="right">
                        <p>9.80E-180</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF084c</p>
                     </c>
                     <c ca="left">
                        <p>TM2 domain</p>
                     </c>
                     <c ca="right">
                        <p>3.80E-14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF093c</p>
                     </c>
                     <c ca="left">
                        <p>Glutathionylspermidine synthase</p>
                     </c>
                     <c ca="right">
                        <p>8.30E-109</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF097c</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryotic N-terminal methylation motif</p>
                     </c>
                     <c ca="right">
                        <p>3.70E-09</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF098c</p>
                     </c>
                     <c ca="left">
                        <p>SPFH domain/Band 7 family</p>
                     </c>
                     <c ca="right">
                        <p>1.10E-06</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF109c</p>
                     </c>
                     <c ca="left">
                        <p>Glutaredoxin-like domain (DUF836)</p>
                     </c>
                     <c ca="right">
                        <p>0.016</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF111c</p>
                     </c>
                     <c ca="left">
                        <p>Ribonucleotide reductase, small chain</p>
                     </c>
                     <c ca="right">
                        <p>4.00E-06</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF130c</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryotic dksA/traR C4-type zinc finger</p>
                     </c>
                     <c ca="right">
                        <p>4.30E-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF168c</p>
                     </c>
                     <c ca="left">
                        <p>HD domain</p>
                     </c>
                     <c ca="right">
                        <p>0.34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF232c</p>
                     </c>
                     <c ca="left">
                        <p>Domain of unknown function (DUF1732)</p>
                     </c>
                     <c ca="right">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF234c</p>
                     </c>
                     <c ca="left">
                        <p>Sodium:solute symporter family</p>
                     </c>
                     <c ca="right">
                        <p>2.60E-34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF238c</p>
                     </c>
                     <c ca="left">
                        <p>Putative metallopeptidase (SprT family)</p>
                     </c>
                     <c ca="right">
                        <p>0.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF004c</p>
                     </c>
                     <c ca="left">
                        <p>CYTH domain</p>
                     </c>
                     <c ca="right">
                        <p>0.14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF010c</p>
                     </c>
                     <c ca="left">
                        <p>dUTPase</p>
                     </c>
                     <c ca="right">
                        <p>5.10E-25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF025c</p>
                     </c>
                     <c ca="left">
                        <p>Carbohydrate binding domain</p>
                     </c>
                     <c ca="right">
                        <p>0.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF026c</p>
                     </c>
                     <c ca="left">
                        <p>Carbohydrate binding domain</p>
                     </c>
                     <c ca="right">
                        <p>0.12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF040c</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryotic N-terminal methylation motif</p>
                     </c>
                     <c ca="right">
                        <p>6.60E-09</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF062c</p>
                     </c>
                     <c ca="left">
                        <p>Putative metallopeptidase (SprT family)</p>
                     </c>
                     <c ca="right">
                        <p>0.00035</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF064c</p>
                     </c>
                     <c ca="left">
                        <p>SPFH domain/Band 7 family</p>
                     </c>
                     <c ca="right">
                        <p>2.40E-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF068c</p>
                     </c>
                     <c ca="left">
                        <p>Bacterial transferase hexapeptide (3 repeats)</p>
                     </c>
                     <c ca="right">
                        <p>0.32</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF110c</p>
                     </c>
                     <c ca="left">
                        <p>HD domain</p>
                     </c>
                     <c ca="right">
                        <p>0.0078</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF111c</p>
                     </c>
                     <c ca="left">
                        <p>UV-endonuclease UvdE</p>
                     </c>
                     <c ca="right">
                        <p>3.60E-20</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF131c</p>
                     </c>
                     <c ca="left">
                        <p>Poly(ADP-ribose) polymerase catalytic domain</p>
                     </c>
                     <c ca="right">
                        <p>0.026</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF132c</p>
                     </c>
                     <c ca="left">
                        <p>ADP-ribosylglycohydrolase</p>
                     </c>
                     <c ca="right">
                        <p>1.10E-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF154c</p>
                     </c>
                     <c ca="left">
                        <p>von Willebrand factor type A domain</p>
                     </c>
                     <c ca="right">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF157c</p>
                     </c>
                     <c ca="left">
                        <p>CreA protein</p>
                     </c>
                     <c ca="right">
                        <p>4.40E-09</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF227c</p>
                     </c>
                     <c ca="left">
                        <p>RyR domain</p>
                     </c>
                     <c ca="right">
                        <p>0.0054</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF230c</p>
                     </c>
                     <c ca="left">
                        <p>Bacterial regulatory proteins, lacI family</p>
                     </c>
                     <c ca="right">
                        <p>0.14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF245c</p>
                     </c>
                     <c ca="left">
                        <p>GatB/Yqey domain</p>
                     </c>
                     <c ca="right">
                        <p>0.17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF289c</p>
                     </c>
                     <c ca="left">
                        <p>Poly A polymerase family</p>
                     </c>
                     <c ca="right">
                        <p>9.00E-31</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF318w</p>
                     </c>
                     <c ca="left">
                        <p>Phage T4 tail fibre</p>
                     </c>
                     <c ca="right">
                        <p>8.10E-06</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF020c</p>
                     </c>
                     <c ca="left">
                        <p>LysM domain</p>
                     </c>
                     <c ca="right">
                        <p>1.70E-07</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF057w</p>
                     </c>
                     <c ca="left">
                        <p>DnaJ domain</p>
                     </c>
                     <c ca="right">
                        <p>2.70E-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF119c</p>
                     </c>
                     <c ca="left">
                        <p>von Willebrand factor type A domain</p>
                     </c>
                     <c ca="right">
                        <p>0.02</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF127c</p>
                     </c>
                     <c ca="left">
                        <p>C-5 cytosine-specific DNA methylase</p>
                     </c>
                     <c ca="right">
                        <p>1.20E-117</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF139c</p>
                     </c>
                     <c ca="left">
                        <p>SPFH domain/Band 7 family</p>
                     </c>
                     <c ca="right">
                        <p>3.80E-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF157c</p>
                     </c>
                     <c ca="left">
                        <p>PhoH-like protein PIN domain</p>
                     </c>
                     <c ca="right">
                        <p>4.20E-15 0.0032</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF179c</p>
                     </c>
                     <c ca="left">
                        <p>DnaJ central domain (4 repeats)</p>
                     </c>
                     <c ca="right">
                        <p>0.28</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF191c</p>
                     </c>
                     <c ca="left">
                        <p>DnaJ central domain (4 repeats)</p>
                     </c>
                     <c ca="right">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF205w</p>
                     </c>
                     <c ca="left">
                        <p>Protein of unknown function (DUF1054)</p>
                     </c>
                     <c ca="right">
                        <p>0.43</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF241c</p>
                     </c>
                     <c ca="left">
                        <p>Zeta toxin</p>
                     </c>
                     <c ca="right">
                        <p>0.36</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF282w</p>
                     </c>
                     <c ca="left">
                        <p>Phage tail fibre adhesin Gp38</p>
                     </c>
                     <c ca="right">
                        <p>0.0035</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF044c</p>
                     </c>
                     <c ca="left">
                        <p>DEAD/DEAH box helicase</p>
                     </c>
                     <c ca="right">
                        <p>0.069</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF046c</p>
                     </c>
                     <c ca="left">
                        <p>Prokaryotic N-terminal methylation motif</p>
                     </c>
                     <c ca="right">
                        <p>0.43</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF102c</p>
                     </c>
                     <c ca="left">
                        <p>D-alanyl-D-alanine carboxypeptidase</p>
                     </c>
                     <c ca="right">
                        <p>0.0014</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF143w</p>
                     </c>
                     <c ca="left">
                        <p>Methyltransferase small domain Ribosomal RNA adenine dimethylase</p>
                     </c>
                     <c ca="right">
                        <p>0.0011 0.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF188c</p>
                     </c>
                     <c ca="left">
                        <p>TFIIB zinc-binding</p>
                     </c>
                     <c ca="right">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF239c</p>
                     </c>
                     <c ca="left">
                        <p>Protein of unknown function (DUF723)</p>
                     </c>
                     <c ca="right">
                        <p>0.098</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF244c</p>
                     </c>
                     <c ca="left">
                        <p>CYTH domain</p>
                     </c>
                     <c ca="right">
                        <p>0.0026</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF260c</p>
                     </c>
                     <c ca="left">
                        <p>Protein of unknown function (DUF1311)</p>
                     </c>
                     <c ca="right">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB69ORF048c</p>
                     </c>
                     <c ca="left">
                        <p>Thymidylate synthase</p>
                     </c>
                     <c ca="right">
                        <p>0.022</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB69ORF050c</p>
                     </c>
                     <c ca="left">
                        <p>Peptidase family U32</p>
                     </c>
                     <c ca="right">
                        <p>0.00055</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB69ORF053c</p>
                     </c>
                     <c ca="left">
                        <p>Nucleotidyl transferase</p>
                     </c>
                     <c ca="right">
                        <p>0.0022</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB69ORF055c</p>
                     </c>
                     <c ca="left">
                        <p>SIS domain</p>
                     </c>
                     <c ca="right">
                        <p>0.0043</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB69ORF104c</p>
                     </c>
                     <c ca="left">
                        <p>Oleosin</p>
                     </c>
                     <c ca="right">
                        <p>0.42</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Putative mobile DNA elements</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF027c</p>
                     </c>
                     <c ca="left">
                        <p>AP2 domain</p>
                     </c>
                     <c ca="right">
                        <p>0.00071</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF066w</p>
                     </c>
                     <c ca="left">
                        <p>LAGLIDADG endonuclease</p>
                     </c>
                     <c ca="right">
                        <p>0.15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF040c</p>
                     </c>
                     <c ca="left">
                        <p>AP2 domain HNH endonuclease</p>
                     </c>
                     <c ca="right">
                        <p>2.20E-07 0.0042</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB49ORF212c</p>
                     </c>
                     <c ca="left">
                        <p>HNH endonuclease</p>
                     </c>
                     <c ca="right">
                        <p>9.20E-07</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Putative nucleotide salvage enzymes</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF072c</p>
                     </c>
                     <c ca="left">
                        <p>Nicotinate phosphoribosyltransferase</p>
                     </c>
                     <c ca="right">
                        <p>9.80E-63</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>44RRORF083c</p>
                     </c>
                     <c ca="left">
                        <p>NUDIX domain</p>
                     </c>
                     <c ca="right">
                        <p>7.10E-15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF119c</p>
                     </c>
                     <c ca="left">
                        <p>Nicotinate phosphoribosyltransferase</p>
                     </c>
                     <c ca="right">
                        <p>1.30E-46</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF282c</p>
                     </c>
                     <c ca="left">
                        <p>NUDIX domain Cytidylyltransferase</p>
                     </c>
                     <c ca="right">
                        <p>8.30E-12 5.80E-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aeh1ORF330c</p>
                     </c>
                     <c ca="left">
                        <p>NUDIX domain</p>
                     </c>
                     <c ca="right">
                        <p>3.00E-08</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF138c</p>
                     </c>
                     <c ca="left">
                        <p>NUDIX domain Cytidylyltransferase</p>
                     </c>
                     <c ca="right">
                        <p>1.90E-13 5.30E-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RB43ORF255w</p>
                     </c>
                     <c ca="left">
                        <p>Nicotinate phosphoribosyltransferase</p>
                     </c>
                     <c ca="right">
                        <p>4.50E-44</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Predicted ORF protein sequences were used to search Pfam using HMMer. Matches with E &lt; 0.5 are shown. Multiple matches are shown for ORFs having non-overlapping matches to more than one domain.</p>
               </tblfn>
            </tbl>
            <p>Several other novel ORFs may be involved in nucleotide modification and synthesis. These include DNA methylase, nucleotidyl transferase, nucleotide triphosphatase and sugar isomerase domain functions identified by Pfam matches. In addition, phylogenetic analyses suggest that phage 44RR appears to have acquired ribonucleotide reductase and thioredoxin genes from a bacterial host, rather than through conservation of the T4-like orthologs <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. A number of the predicted ORFs likely to be involved in gene regulation were also identified, including DNA binding proteins, polyADP-ribosylases and -hydrolases, DNA helicases, an excision repair endonuclease and homing endonucleases, as indicated in Table <tblr tid="T3">3</tblr>. Other putative functions identified include membrane proteins, peptidases, ATPases, an exotoxin, and a putative DnaJ-type protein chaperone. Several ORFs that do not match known genes in GenBank do match GenBank environmental sample sequences. It is unclear if these matches are to uncharacterized bacterial hosts, or to unknown bacteriophages.</p>
            <p>All ORFs were also searched for matches to signal peptide <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and transmembrane motifs <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Tables of ORFs matching these motifs for each genome are available (see <supplr sid="S2">additional file 2</supplr>).</p>
         </sec>
         <sec>
            <st>
               <p>Mobile DNA elements</p>
            </st>
            <p>The T4 genome encodes a number of mobile DNA elements, including 3 group I introns with integrated ORFs encoding homing endonucleases as well as the freestanding homing endonucleases genes (HEGs), <it>mob </it>and <it>seg </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. No group I introns were detected among any of the T4-like genomes sequenced here. However, two ORFs bearing similarity to the <it>mob </it>genes of T4 were identified in Aeh1 and RB43. An ORF similar to T4 <it>segD </it>has also been described for KVP40 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Thus, T4 seems to carry many more mobile elements than the genomes analyzed here. Interestingly, both RB49 and RB43 exhibit matches to a recently identified class of HEGs, AP2-HNH mobile DNA elements, which are related to the AP2 DNA transcription factor in plants <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> (also see <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>). This class of HEGs has been postulated to have transferred from bacteriophages into plant genomes via the chloroplast genome <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Putative signals for transcriptional regulation</p>
            </st>
            <p>The similarities of genome organization to T4 suggested that T4 transcriptional regulatory circuits might be conserved for many T4-like phages in nature. However, phages 44RR and Aeh1 replicate in different hosts than T4 and coliphage RB43 has a substantially rearranged genome compared to the T4 prototype. The relevance of these differences to gene regulation was analyzed by prediction of transcriptional promoter elements in each genome. Consensus nucleotide sequences have been described for three temporal classes of promoters in T4: genes expressed early, middle and late in the infectious cycle <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Each of the five T4-like genomes was searched for matches to these T4 transcriptional regulatory signals.</p>
         </sec>
         <sec>
            <st>
               <p>Early promoters</p>
            </st>
            <p>The T4 early promoter consensus <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> was used as a start point for identifying sequence similarities in the 5 T4-like genomes using the string search program fuzznuc <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Matching sequences were scrutinized for their locations relative to the predicted translation initiation site of putative early genes or other ORFs. These sequences were then used in an iterative fashion to find additional sequences using the HMMer program, which develops a statistical model for the consensus with which more refined searches of the genome can be done. Successive rounds of sequence selection and refinement were done until the number and locations of the sequences found ceased to change. From this analysis, we derived an early gene promoter motif for each phage. The locations of the final set of putative promoters on the genome were then manually examined. In virtually all cases, putative promoter elements were identified 5' to a predicted translational start site for a predicted ORF or conserved gene and in the correct orientation for transcription of this ORF. Thus, the predicted promoters appear to be plausible transcription initiation sequences. In each case, the sequences of the presumed early promoters thus identified had similarities to the T4 early consensus, but with some distinct differences that are illustrated in Figure <figr fid="F2">2</figr>. All predicted early promoters had similarity in the -35 region sequence to the GTTTAC sequence (-36 to -31) found in T4 <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, but in RB49, RB43 and Aeh1 there was a definite preference for G rather than T at position -33. In T4, this position is believed to be a preferred site of interaction of the ADP-ribosylated alpha subunit of RNA polymerase; a modification that is made in this subunit by the T4 encoded Alt protein <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Phages RB49 and Aeh1 have putative <it>alt </it>genes, but in both cases the predicted Alt protein sequences are considerably diverged from the T4 sequence (data not shown); RB43 apparently lacks an <it>alt </it>ortholog. Position -36 is a strongly conserved G in some of the genomes analyzed but for RB43 it can be G or C; Aeh1 shows even less sequence conservation in the -36 position. All the phages frequently have an A-rich sequence from -40 to -44. This region resembles the UP element, which enhances transcription and is a site of interaction with the T4 ADP-ribosylated alpha subunit of RNA polymerase <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Sequence logo representation of putative early promoter consensus for each genome</p>
               </caption>
               <text>
                  <p><b>Sequence logo representation of putative early promoter consensus for each genome</b>. Sequences were identified using fuzznuc [24] and HMMer [53]. Consensus sequences were plotted with WebLogo [54]. Height of letter indicates degree of conservation. Nucleotide 0 is the putative transcription start site. Putative up elements and the -10 region are boxed.</p>
               </text>
               <graphic file="1743-422X-3-30-2"/>
            </fig>
            <p>All putative early promoters resemble the T4 consensus in the -10 region, which is recognized in the host by the &#963; subunit of RNA polymerase. In general, there is high conservation of T at position -7 and A residues at position -11, as seen in T4. However, in our phage conservation of the T at position -12 is variable; T is not rigidly conserved at position -12 in Aeh1, and in RB49 it can be either T or C. There is variable conservation of the GT-rich sequence 5' to position -12 exhibited by T4. 44RR shows a higher degree of conservation of A at -8 than any of the other phages. The genomes of RB69, RB49, and 44RR all show preference for C residues in the -3 to -1 region. The predicted RB49 early consensus agrees with that previously identified by 5' end mapping of RB49 early transcripts <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
            <p>When the sites of predicted early promoters were mapped onto their respective genomes, many promoters were located 5' to orthologs of T4 early genes, as expected. Importantly, a large number of early promoters were predicted 5' to novel ORFs, including those for which no homologs exist in the sequence databases. For example, of 57 putative early promoters in RB69, 13 were upstream of novel ORFs and 45 were upstream of T4 orthologs (see example in Figure <figr fid="F3">3</figr>). These observations suggest that many novel ORFs are coordinately regulated along with the flanking conserved early T4-like genes. Early promoters were also found 5' to the tRNA genes, described below. Coordinates of putative early promoters can be found in the supplements (see <supplr sid="S3">additional file 3</supplr>).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Location of early promoter sequences on the RB69 genome</p>
               </caption>
               <text>
                  <p><b>Location of early promoter sequences on the RB69 genome</b>. The top panel shows an overview. Conserved Genes are shown as yellow arrows, novel ORFs as red line arrows, predicted early promoters are shown as large black arrows, and TransTerm [38] predicted terminators as red blocks. The bottom panel shows detail of one region. Predicted transcripts are shown at the bottom, blue arrows indicate transcripts expected from conserved gene promoters and red arrows designate those expected from novel ORF promoters. Orthologs of genes known to be expressed early in T4 infections are boxed. Red boxes indicate genes present only on predicted ORF promoter transcripts; blue-boxed genes are present on conserved and ORF promoter transcripts. Black boxes are early genes whose transcripts could not be predicted.</p>
               </text>
               <graphic file="1743-422X-3-30-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Middle promoters</p>
            </st>
            <p>In the T4 infectious cycle, early transcription is followed by "middle mode" transcription, which is initiated by the binding of the phage-encoded MotA protein to its cognate recognition sequence at T4 middle promoters <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. We used two criteria to attempt to detect conserved elements of T4-like middle mode transcriptional regulation among the five genomes studied: (a) matches to the T4 middle promoter consensus <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> and (b), matches to the T4 MotA protein sequence. The RB69 genome includes a <it>motA </it>ortholog (blastp E = 5X10<sup>-48</sup>). Putative RB69 middle promoter sequences were identified using a similar strategy to that described for early promoters, but based upon the consensus sequence, (a/t)(a/t)(a/t) TGCTTtAN(11&#8211;13)TataAT <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> The RB69 middle consensus clearly resembles that of T4 (Figure <figr fid="F4">4A</figr>); with conservation of the residues at positions -12, -11, and -7 of the T4 consensus. Also, the putative RB69 middle genes exhibit extended conserved sequences from positions -13 to -16, as seen in T4. T4 middle promoters show little similarity to the -35 region of <it>E. coli </it>&#963;<sup>70 </sup>promoters, but do possess the highly conserved GCTT motif (the T4 Mot box) at positions -30 to -27. This motif serves as the site of interaction of the T4 MotA protein with DNA. RB69 middle promoters also show similarity to the Mot box, which is presumably bound by the RB69 MotA ortholog. However, among the 4 other genomes studied, only the 44RR genome had an ortholog to the T4 MotA protein and sequence motifs similar to the T4 MotA-dependent promoters. Nine putative 44RR middle promoters were identified. They resemble the middle-mode consensus sequences of both T4 and RB69, but lack conservation at nucleotide position -11 (Figure <figr fid="F4">4A</figr>). The relatively small number of putative middle-promoters that we have detected in 44RR tempers the interpretation of their significance. However, the presence of a strong match (blastp E = 2X10<sup>-33</sup>) to the T4 <it>motA </it>gene function in this <it>Aeromonas </it>phage is probably indicative of the presence of a 44RR-encoded middle-mode transcriptional apparatus. Previous attempts to identify a middle promoter consensus and a <it>motA </it>ortholog in RB49 were unsuccessful <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> as were our attempts for RB49, RB43 and Aeh1. RB69 and 44RR also possess orthologs of the MotA co-activator AsiA <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Surprisingly, Aeh1 and KVP40, also encode AsiA proteins, which have been shown to bind T4 MotA <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, even though no ligand homologous or analogous to MotA has been identified for these genomes. AsiA can act as transcriptional inhibitor in the absence of MotA <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, or may interact with another phage protein which has yet to be identified. Coordinates of putative middle promoters can be found in the supplements (see <supplr sid="S4">additional file 4</supplr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Sequence logo representation of putative middle promoter consensus for RB69 and 44RR</p>
               </caption>
               <text>
                  <p><b>(A) Sequence logo representation of putative middle promoter consensus for RB69 and 44RR</b>. Consensus was identified and plotted as in Figure 2. <b>(B) </b>Putative late promoter consensus for each genome. Consensus was identified as for early promoters, using fuzznuc and HMMer, except Aeh1, for which ELPH [37] and HMMer were used initially.</p>
               </text>
               <graphic file="1743-422X-3-30-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Late promoters</p>
            </st>
            <p>In T4, late promoters are recognized by a phage-encoded &#963; factor, gp55. Contact between T4 gp55 and the DNA is facilitated by the T4 polymerase sliding clamp, gp45. A third T4-encoded gene product, gp33 forms a bridge between gp55 and gp45 <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The T4 late promoter consensus sequence is a short but highly conserved motif, TATAAATA, between nucleotide positions -13 and -6 relative to the transcriptional start site <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Putative late promoters were found readily for four of the five phage genomes studied, using the strategy employed for early and middle promoter searches (Figure <figr fid="F4">4B</figr>). However, the T at position -13 was poorly conserved for most phages, with either A or T commonly found at this position. A similar observation was made for late promoters in an earlier description of RB49 late promoters <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, as well as in KVP40 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and S-PM2 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
            <p>Since our search strategy failed to detect late promoter sequences for phage Aeh1, an alternative strategy was employed to identify them. Regions upstream of ORFs orthologous to T4 late genes were analyzed with the ELPH program <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> to identify sequence motifs common to these DNA segments. The selected motifs were used as seed to identify additional late promoter sequences using HMMer. This strategy identified a conserved sequence, CTAAATA, beginning at -12 from the putative initiation site. Once identified, this putative promoter sequence was used as a seed for string search followed by HMM refinement used for late promoters of the other phages. Although the C at position -12 is a strong determinant for detection of Aeh1 late promoters, C is rarely found at this position in the putative late promoters of the other four phage genomes (Figure <figr fid="F4">4B</figr>). It should be noted that the phage Aeh1 gp55 protein, which presumably recognizes the divergent late promoter sequences of Aeh1, is itself substantially diverged from all the other phage gp55 sequences (data not shown). Coordinates of putative late promoters can be found in the supplements (see <supplr sid="S5">additional file 5</supplr>).</p>
         </sec>
         <sec>
            <st>
               <p>Terminators and operons</p>
            </st>
            <p>Putative rho-independent terminator sequences were identified for all 5 genomes, using the TransTerm program <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Although the locations of putative terminator sequences vary between phages, several terminators appear at conserved locations (see <supplr sid="S6">additional file 6</supplr>). One striking example is the bi-directional terminator predicted downstream of <it>uvsW.1</it>that is conserved in T4 and the other 5 genomes. In all cases, the gene downstream of <it>uvsW.1 </it>is transcribed from the opposite strand and a bidirectional terminator is predicted between the converging transcripts. Genes <it>35 </it>and <it>36 </it>are transcribed rightward and a predicted terminator is located between them in all 6 genomes. Likewise, gene <it>23 </it>has a terminator predicted downstream in all 6 genomes. Terminators conserved in 5 out of 6 genomes were identified downstream of Gene <it>32 </it>and upstream of <it>alt</it>.</p>
            <p>Comparisons between the positions of predicted terminators and transcription initiation signals allowed the identification of putative operons of gene expression. An example of operon structure from phage RB69 is shown in Figure <figr fid="F3">3</figr>. In some instances, it appears that the upstream promoters of novel genes drive expression of T4-like early genes that lack their own early promoter. In general, T4-like genes are predicted to be in operons with other T4-like genes, while novel ORFs appear to reside in operons with other novel ORFs.</p>
         </sec>
         <sec>
            <st>
               <p>tRNAs and codon bias</p>
            </st>
            <p>The bacteriophage T4 genome encodes eight tRNA genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The other T4-like genome sequences were searched for potential tRNA genes, using tRNAscan-SE <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The number of potential tRNA genes varied considerably among genomes (Table <tblr tid="T1">1</tblr>), ranging from zero in RB49 to 24 in Aeh1. Some common features were noted among the tRNA genes encoded by the phage genomes (Table <tblr tid="T4">4</tblr>). All genomes that encoded tRNAs had a predicted tRNA with a CAU anticodon. Although predicted to be Met tRNA by tRNAscan-SE, these tRNAs share signature sequences found in tRNAs recognized by IleRS <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. This class of Ile tRNAs is post-transcriptionally modified to lysidine at the anticodon, converting them to Ile-recognizing anticodons resembling AUA <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. An alignment of phage Ile and Met tRNAs is shown in Figure <figr fid="F5">5</figr>. tRNAs for Leu, Ser and Arg are among the most commonly identified putative tRNAs genes encoded in the T4-like genomes, including the previously sequenced genomes of T4 and KVP40. Other tRNAs are found more rarely, such as Ala, Pro, Gly and Val. These recognize GC rich codons, which are unusual in AT-rich T4-like genomes <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Predicted tRNAs</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>tRNA</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Aeh1</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>44RR</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>T4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>RB69</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>RB43</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ala UGC</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Arg UCU</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Asn GUU</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Asp GUC</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cys GCA</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Gln UUG</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Glu UUC</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Gly UCC</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>His GUG</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ile CAU*</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ile GAU</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Leu CAA</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Leu UAA</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Leu UAG</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lys UUU</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Met CAU</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Met CAU</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phe GAA</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pro UGG</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ser GCU</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ser UGA</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Thr UGU</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Trp CCA</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tyr GUA</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Val CAC</p>
                     </c>
                     <c ca="left">
                        <p>+</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pseudo</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The presence of a tRNAscan-SE predicted species is indicated for each genome. The number of predicted tRNA pseudogenes is also indicated. * indicates putative lysine-modified tRNA<sup>Ile </sup>[41-44].</p>
               </tblfn>
            </tbl>
            <p>In bacteriophage T4, the presence of tRNA genes appears to correlate with differences in codon bias for the phage versus the <it>E. coli </it>host <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The genomes sequenced here show much less correlation to differences from their laboratory hosts. A similar observation was made for the vibriophage KVP40 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Thus, the functional role of the tRNA genes for these phages remains unclear. Nevertheless, the high degree of conservation of some tRNAs, such as the putative modified tRNA<sup>Ile </sup>mentioned above, suggests an important functional role for at least some of these tRNAs.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>tRNA alignment</p>
               </caption>
               <text>
                  <p><b>tRNA alignment</b>. Putative lysidine-modified phage tRNA-Ile sequences were aligned by secondary structure using clustalW. E. coli modified tRNA-Ile and phage Met-CAU and Ile-GAU sequences are shown for comparison.</p>
               </text>
               <graphic file="1743-422X-3-30-5"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The genome sequences presented here display broad diversity in primary sequence. Orthologous ORFs can be detected for 45 to 85 percent of open reading frames between any pair of these genomes. Orthologous protein sequences are on average 65% similar between genomes. This diversity is comparable to that seen across vertebrate evolution. For example, humans and chickens share 60% orthologous genes at a median amino acid similarity of 75%. Humans and teleost fishes share approximately 55% orthologous genes. The two most closely related phage genomes analyzed here, T4 and RB69, share 80% orthologs of 81% similarity, a distance comparable to that between humans and mice. Despite the diversity of their predicted protein sequences, these five T4-like phage genomes share a highly conserved genome organization. Most orthologs of T4 genes were identified in the same gene order and orientation as the cistrons in T4. RB43 shows the largest number of exceptions to this observation. It appears that several genome rearrangements must have occurred in one or both of these phages since they diverged from their common ancestor.</p>
         <p>The possibility of shared genetic regulatory elements among the T4-like phages was investigated by motif searches that identified putative promoter elements resembling T4 early and late promoters in all genomes. Late promoters were found exclusively 5' to conserved orthologs of T4 late genes. Many early promoters were found 5' to T4 early gene orthologs, but others were found 5' to novel ORFs. It thus appears that the early and late transcriptional modes are conserved among the T4-like phages. The novel ORFs appear to be coordinately expressed with early genes in all phages. The middle gene expression pathway appears to be less conserved among the T4-like phages. The middle promoter consensus was detected in RB69, and to a lesser degree in 44RR. The MotA protein product, required for recognition of the middle promoter Mot box, appears to be conserved only in T4, RB69 and 44RR.</p>
         <p>The T4 genome is predicted to encode over 120 ORFs of unknown function. 11 ORFs were found to have homologs in all five of the genomes in our study. Given this level of conservation, these ORFs must encode products that are vital to the phage in some hosts or environments. We have identified putative functional domains for 5 of these ORFs based on matches to known Pfam domains. The candidate functions include nucleotide metabolism, host cell lysis, and gene regulation. An aggregate of about 70% of T4 ORFs are conserved in at least one other genome, suggesting that the protein products of these ORFs provide selective advantages to these phages. Conservation of these ORFs does not generally extend to more divergent phages than those analyzed here. Although several of these ORFs are conserved in KVP40, no matches were found in any of the marine phage genomes.</p>
         <p>Each of the T4-like genomes we have examined, including T4, harbors a number of ORFs that are unique to that genome. In Aeh1, these novel ORFs comprise over half of the Aeh1 genome and most show no significant similarity to known sequences in GenBank. Functions identified for some novel ORFs suggest physiologically important roles in the phage life cycle, such as nucleotide metabolism, transcription and lateral DNA mobility. However, most novel ORFs have no known function or origin. It is thus unclear where these sequences arose, how they were acquired, and what function they might serve in the phage-infected cell. In many instances, regions containing novel ORFs were observed to be underrepresented in plasmid libraries constructed for shotgun sequencing and were only identified during PCR-based gap closure <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and data not shown). It would appear then, that at least some novel ORFs in our study are deleterious to the host cell when expressed in high copy plasmids. Some of the gene products of these ORFs may function in cell lysis or in commandeering host machinery for phage growth.</p>
         <p>The mechanisms of gain and loss of ORFs by T4-like genomes in evolution may differ from that proposed for the genomes of other phages, such as the lambdoid phage <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The novel lambdoid ORFs include "morons" &#8211; apparent short insertions of DNA consisting of an ORF flanked by transcriptional promoter and terminator signals. Moron DNAs are distinct from other lambdoid genes in %GC content, and thus appear to be recent acquisitions of genes by nonhomologous recombination with host DNA. In contrast, the majority of novel ORFs in T4-like phages does not appear moronic; they have a %GC that is indistinguishable from the rest of the phage genome (average %GC in RB69: ORFs-36.9%, conserved-37.6%) and thus do not appear to be recent acquisitions from the host. Another class of novel lambdoid ORFs appears to be chimeras of other phage genes <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. In the few instances where the T4-like novel ORFs have significant matches to other phage or GenBank proteins, the similarities generally extend over the entire length of the coding sequence rather than being restricted to the blocks of similarity found in chimeras. A better understanding of the origins of the novel ORFs in T4-like phages will provide clues into the mechanisms underlying the evolution of protein coding sequences and the biology of host-phage interactions. The mechanisms by which T4-like phages acquire ORFs may differ from the lambdoid phages. T4-like phage do not undergo lysogeny, thus they cannot acquire genes by imprecise excision from the host genome. They do not generally transduce host DNA as frequently as other Myoviridae, such as P22 <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, perhaps because of their propensity to hydrolyze host DNA. T4-like phages have a recombination-driven replication pathway that is facilitated by redundant DNA sequences at the chromosome ends. During replication, the redundant end sequences synapse with homologous regions of other replicating DNA molecules for further replication into long concatamers <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. A variation of this pathway has been postulated as a mechanism for the lateral transfer of novel genes between related phages <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. However, the ultimate source of these novel genes remains unknown but may include bacterial hosts or bacteriophages encountered in coinfection. The failure to detect significant similarities between many of the novel ORFs described here and known bacterial genomes indicates that either these ORFs arose from bacterial hosts quite diverged from any known bacterium, or that bacterial genomes are not a major source for these ORFs. The latter appears to be more likely, at least in the case of novel ORFs identified in closely related phages, such as T4 and RB69. Unknown phages would seem a more likely source for many of these ORFs. Newly sequenced phage genomes often include numerous ORFs for which there is no known ortholog. Clearly, more phage genomes must be mined to incorporate more of their sequence diversity into the library of known sequence databases.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Our survey of a diverse set of T4-like phage genomes reveals similarities in general genome organization and gene regulation. Although a core of conserved ORFs was identified, the genome sequences exhibited a striking diversity of ORFs novel to each genome. The origins of this diversity have yet to be uncovered.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Bacteriophages and hosts</p>
            </st>
            <p>Bacteriophages, bacterial hosts and growth conditions were as described <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Phage DNA was prepared from plate lysates sequenced, and assembled as described in <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Genome annotation</p>
            </st>
            <p>ORFs were detected primarily by use of the GeneMarkS program <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. The program was chosen based upon its accuracy in ORF prediction of the T4 genomic sequence by comparison to the GenBank accession (97% of ORFs recognized). When an orthologous gene was detected in a related phage genome, the predicted translational start sites were scrutinized for additional N-terminal protein sequences with significant similarity to orthologs upstream of the predicted translational start site. In these cases, the translational start site was adjusted to maximize the length of predicted amino acid similarity. Although prediction models were not based upon similarity between genomes, generally fewer than 5% of the predicted start sites required adjustment.</p>
            <p>GeneMarkS predictions were compared with those obtained using Glimmer <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. There was general agreement between the predictions obtained with the two programs. Glimmer predicted more ORFs per genome, but in some cases the additional ORFs predicted were inconsistent with the direction of transcription of flanking genes, which is uncommon in T4 <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and appears unusual for the genomes sequenced here. Thus, the Glimmer predictions were used primarily to adjust GeneMarkS predictions as mentioned above, or in regions where Glimmer predicted an ORF and GeneMarkS predicted an unusually long (> 200 bp) intercistronic region.</p>
            <p>Predicted ORFs were checked for similarity to T4 genes by blastp <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> mutual similarity. Genes with mutual best hit E-values &lt; 10<sup>-4 </sup>to known T4 genes were designated by the T4 gene name. Putative genes without T4 orthologs were designated by their ORF numbers, with conserved gene <it>rIIA </it>designated as ORF001. The strand of each ORF is designated "w" for clockwise (left-to-right) transcribed genes, and "c" for counterclockwise (right-to-left) transcribed genes. In T4, the origin of the genome has been assigned to the <it>rIIB </it>&#8211; <it>rIIA </it>intercistronic region; the terminus of the genome is defined as the start of translation of the <it>rIIB </it>gene. The sequence origin of each genome sequenced here is defined as the termination codon of the <it>rIIA </it>gene.</p>
            <p>Genomes were also searched for tRNA genes using tRNAscan-SE <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. All genomes except that of RB49 had at least one putative tRNA gene.</p>
            <p>DNA sequences are available through GenBank [Genbank:<ext-link ext-link-type="gen" ext-link-id="NC_005135">NC_005135</ext-link>] (44RR), [Genbank:<ext-link ext-link-type="gen" ext-link-id="NC_007023">NC_007023</ext-link>] (RB43), [Genbank:<ext-link ext-link-type="gen" ext-link-id="NC_004928">NC_004928</ext-link>] (RB69), [Genbank:<ext-link ext-link-type="gen" ext-link-id="NC_005260">NC_005260</ext-link>] (Aeh1), and [Genbank:<ext-link ext-link-type="gen" ext-link-id="NC_005066">NC_005066</ext-link>] (RB49). Additional analyses are available through the Tulane T4-like Genome Website <url>http://phage.bioc.tulane.edu</url> Available data include an interactive genome browser <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, clustalW <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> alignments, EMBOSS pepstat statistics, octanol hydropathy plots <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, and HMMer Pfam matches <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JMN designed and performed machine annotations of all genomes, performed promoter searches and drafted the manuscript and figures. VP provided additional annotations for all genomes and aided in construction of annotation tables. CB contributed to annotations. HMK co-conceived the study and provided manuscript comments. JDK conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <text>
               <p><b>High-resolution genome map</b>. Genome map is as indicated for Figure <figr fid="F1">1</figr>, but predicted gene names are also indicated.</p>
            </text>
            <file name="1743-422X-3-30-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional File 2</p>
            </title>
            <text>
               <p><b>Predicted transmembrane and signal peptide matches for ORFs</b>. The amino acid coordinates of each ORF matching Transmembrane <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> orSignal peptide <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> motifs are indicated for each ORF. Multiple transmembrane regions are predicted for some ORFs.</p>
            </text>
            <file name="1743-422X-3-30-S2.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional File 3</p>
            </title>
            <text>
               <p><b>Coordinates of predicted early promoters</b>. Coordinates are in GFF format. For promoters on + strand, the 5' end of the sequence is the leftmost coordinate, for promoters on &#8211; strand, the 5' end of sequence is the rightmost coordinate. Promoters are named by their 5' end; those that differ in length from the consensus are noted.</p>
            </text>
            <file name="1743-422X-3-30-S3.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional File 4</p>
            </title>
            <text>
               <p><b>Coordinates of predicted middle promoters</b>. Coordinates are in GFF format. For promoters on + strand, the 5' end of the sequence is the leftmost coordinate, for promoters on &#8211; strand, the 5' end of sequence is the rightmost coordinate. Promoters are named by their 5' end; those that differ in length from the consensus are noted.</p>
            </text>
            <file name="1743-422X-3-30-S4.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional File 5</p>
            </title>
            <text>
               <p><b>Coordinates of predicted late promoters</b>. Coordinates are in GFF format. For promoters on + strand, the 5' end of the sequence is the leftmost coordinate, for promoters on &#8211; strand, the 5' end of sequence is the rightmost coordinate. Promoters are named by their 5' end.</p>
            </text>
            <file name="1743-422X-3-30-S5.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional File 6</p>
            </title>
            <text>
               <p><b>Coordinates of Predicted rho-independent terminators</b>. Coordinates are in GFF format. For promoters on + strand, the 5' end of the sequence is the leftmost coordinate, for promoters on &#8211; strand, the 5' end of sequence is the rightmost coordinate. Bidirectional terminators have the strand designation "." Terminators are named according to their flanking genes.</p>
            </text>
            <file name="1743-422X-3-30-S6.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Guy Plunkett and Takashi Kunisawa for identifying putative lysidine-modified tRNA genes. JN thanks Eric Miller for numerous helpful discussions, and Candace Timpte for helpful comments on the manuscript. This work was supported by awards MCB-0138236 and EF-0333130 from the National Science Foundation to JDK.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>T4-like viruses</p>
            </title>
            <aug>
               <au>
                  <snm>B&#252;chen-Osmond</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>ICTVdB - The Universal Virus Database, version 3</source>
            <url>http://www.ncbi.nlm.nih.gov/ICTVdb/ICTVdB/</url>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A catalogue of T4-type bacteriophages</p>
            </title>
            <aug>
               <au>
                  <snm>Ackermann</snm>
                  <fnm>HW</fnm>
               </au>
               <au>
                  <snm>Krisch</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Arch Virol</source>
            <pubdate>1997</pubdate>
            <volume>142</volume>
            <issue>12</issue>
            <fpage>2329</fpage>
            <lpage>2345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s007050050246</pubid>
                  <pubid idtype="pmpid" link="fulltext">9672598</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Bacteriophage T4 Genome</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Kutter</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mosig</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Arisaka</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kunisawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ruger</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>2003</pubdate>
            <volume>67</volume>
            <issue>1</issue>
            <fpage>86</fpage>
            <lpage>156</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">150520</pubid>
                  <pubid idtype="pmpid" link="fulltext">12626685</pubid>
                  <pubid idtype="doi">10.1128/MMBR.67.1.86-156.2003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Bacteriophage genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Hendrix</snm>
                  <fnm>RW</fnm>
               </au>
            </aug>
            <source>Curr Opin Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>6</volume>
            <issue>5</issue>
            <fpage>506</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.mib.2003.09.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">14572544</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Circular permutation analysis of phage T4 DNA by electron microscopy</p>
            </title>
            <aug>
               <au>
                  <snm>Grossi</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Macchiato</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Gialanella</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Z Naturforsch [C]</source>
            <pubdate>1983</pubdate>
            <volume>38</volume>
            <issue>3-4</issue>
            <fpage>294</fpage>
            <lpage>296</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6346725</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Homologous recombination</p>
            </title>
            <aug>
               <au>
                  <snm>Mosig</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Molecular Biology of Bacteriophage T4</source>
            <publisher>Washington, D.C. , American Society for Microbiology</publisher>
            <editor>Karam JD, Drake JW, Kreuzer KN, Mosig G, Hall DH, Karam JD, Drake JW, Kreuzer KN, Mosig G, Hall DH, Eiserling FA, Black LW, Spicer EK, Kutter E, Carlson K, Miller ES</editor>
            <pubdate>1994</pubdate>
            <fpage>54</fpage>
            <lpage>82</lpage>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Complete Genome Sequence of the Broad-Host-Range Vibriophage KVP40: Comparative Genomics of a T4-Related Bacteriophage</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Durkin</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Ciecko</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Feldblyum</snm>
                  <fnm>TV</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Nierman</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Szczypinski</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2003</pubdate>
            <volume>185</volume>
            <issue>17</issue>
            <fpage>5220</fpage>
            <lpage>5233</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">180978</pubid>
                  <pubid idtype="pmpid" link="fulltext">12923095</pubid>
                  <pubid idtype="doi">10.1128/JB.185.17.5220-5233.2003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The genome of S-PM2, a "photosynthetic" T4-type bacteriophage that infects marine Synechococcus strains</p>
            </title>
            <aug>
               <au>
                  <snm>Mann</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Clokie</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Millard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cook</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Wheatley</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Letarov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krisch</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2005</pubdate>
            <volume>187</volume>
            <issue>9</issue>
            <fpage>3188</fpage>
            <lpage>3200</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1082820</pubid>
                  <pubid idtype="pmpid" link="fulltext">15838046</pubid>
                  <pubid idtype="doi">10.1128/JB.187.9.3188-3200.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations</p>
            </title>
            <aug>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Coleman</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Weigele</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Chisholm</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <issue>5</issue>
            <fpage>e144</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1079782</pubid>
                  <pubid idtype="pmpid" link="fulltext">15828858</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0030144</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The diversity and evolution of the T4-type bacteriophages</p>
            </title>
            <aug>
               <au>
                  <snm>Desplats</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Krisch</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Res Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>154</volume>
            <issue>4</issue>
            <fpage>259</fpage>
            <lpage>267</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0923-2508(03)00069-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12798230</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Improving gene annotation of complete viral genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Mills</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rozanov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lomsadze</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>23</issue>
            <fpage>7041</fpage>
            <lpage>7055</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">290248</pubid>
                  <pubid idtype="pmpid" link="fulltext">14627837</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg878</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions</p>
            </title>
            <aug>
               <au>
                  <snm>Besemer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lomsadze</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>12</issue>
            <fpage>2607</fpage>
            <lpage>2618</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55746</pubid>
                  <pubid idtype="pmpid" link="fulltext">11410670</pubid>
                  <pubid idtype="doi">10.1093/nar/29.12.2607</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Divergence of the DNA replication genes among T4-like phage genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Petrov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Nolan</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Bertrand</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chin Levy</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Desplat</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Krisch</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Karam</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>in press</volume>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Temperature-Sensitive Mutants Of Bacteriophage T4d: Their Isolation And Genetic Characterization</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Lielausis</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1964</pubdate>
            <volume>49</volume>
            <fpage>649</fpage>
            <lpage>662</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14156925</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Structural/functional assignment of unknown bacteriophage T4 proteins by iterative database searches</p>
            </title>
            <aug>
               <au>
                  <snm>Kawabata</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Arisaka</snm>
