by Christoph
When microbiologists describe a previously unknown bacterium, they are mainly interested in its lifestyle, metabolism, morphology and cell cycle (see part 1). For me, "getting to know" a newbie always involves – let's call it an addiction – reading its genome sequence and finding out how it initiates the replication of its chromosome(s) (see here in STC for the Lilliputians – formerly CPR bacteria, now Patescibacteria), here for Planctomycetota, here for Borrelia). This is particularly intriguing, of course, if it is the first characterized bacterium of an entire phylum, the Atribacteria (now: Atribacterota).
(click to enlarge)
Figure 1. Gene order in the oriC region of Atribacter laminatus RT761 and Atribacter sp. isolate AMR_MDS _4709 (phylum Atribacterota), Sulfurimonas denitrificans DSM 1251, and Bacillus subtilis 168. Gene sizes are to scale, and the direction of their transcription is indicated by "<" for counter-clockwise (ccw), or ">" for clockwise (cw). Red boxes mark intergenic regions containing DnaA binding sites (DnaA boxes). Orange boxes indicate the DUE (DNA unwinding element, ref.) in oriC. Both Atribacter oriCs are predictions by the author; the S. denitrificans oriC was confirmed experimentally (ref.) as was B. subtilis oriC (ref.). Figure by the author. Frontispiece: Confocal-laser microscopy showing the localization of DNA and RNA within the intracytoplasmic membrane structure. DNA, RNA, and membrane lipids were stained by Hoechst (blue), SYTO RNAselect (green) and FM4-64 (red), respectively. Outlines of the cell from a are included in all panels. a Phase contrast image. b–d Confocal-laser images. Scale bars, 1 μm. Source
Could I possibly "read" the replication origin, oriC, and the initiator protein, DnaA, from the genome sequence of Atribacter laminatus RT761 and of Atribacter sp. isolate AMR_MDS_4709 for a comparison? The dnaA genes are annotated in both genome sequences. The protein sequences share 95% identity, pointing to a close relationship on the (taxonomic) family level, similar to the 96% identity of E. coli K-12 MG1655 and Salmonella Typhimurium LT2 DnaA. An aside: the DnaA proteins of A. laminatus RT761 and E. coli K‑12 MG1655 share only 42% identity, which is low but in the typical range for DnaA proteins of species belonging to different phyla.
Origin sequences cannot simply be "read" from genome sequences because there is no "oriC code" as for protein-coding sequences and they are therefore, with few exceptions, not included in the annotations of sequenced genomes. But oriCs are often found in the dnaA·dnaN intergenic region of bacterial genomes and can be detected by conserved structural elements: counter-clockwise transcription of the left-flanking gene, a DNA-unwinding element (DUE), DnaA-trio motifs, and multiple DnaA binding sites. *)
Both Atribacters have an oriC structure located in the dnaA·dnaN intergenic region and, in addition, a cluster of DnaA-binding sites located in the dnaA upstream region (Figure 1). The overall arrangement is identical to that of the oriC regions of bacteria from different phyla, Sulfurimonas denitrificans (Campylobacterota, formerly Epsilonproteobacteria) and B. subtilis (Bacillota, formerly Firmicutes). For B. subtilis, it is known that for full origin functioning in vivo both the incA region and oriC are necessary. Presumably, DnaA monomers bound to each of the two regions interact and lead to DNA looping, although the function of this looping is not understood. Whether also the Atribacters have such a bi‑partite replication origin or whether the DnaA-binding sites upstream of the dnaA gene contribute to the regulation of its expression would have to be studied experimentally. The way I have gotten to know the Small Things, I would not be surprised if they use both options, simultaneously or alternately.
(click to enlarge)
Figure 2. Partial DNA sequence and structural elements in the oriC region of Atribacter laminatus RT761, Atribacter sp. isolate AMR_MDS_4709, E. coli K-12 MG1655, and B. subtilis 168. Orange box: DUE (DNA unwinding element, ref.) predicted using the SIST software. Dots above (upper strand) or below (lower strand) the sequence mark the bases where Krause & Messer (1999) found unwinding experimentally (red dots: with SSB; pale red: without SSB). The sequences were aligned on the DnaA box R1 (TGTGNAWAA, rev) in E. coli oriC (red box). DnaA boxes with a close match to the consensus sequence TGTGNAWAA are shown in red, while those deviating from the consensus (<4 mismatches) are shown in pale red. DnaA boxes (TTWTNCACA, fwd) are shown in turquoise, relaxed boxes in light turquoise. Dark blue: IHF-binding site in E. coli oriC (ref.). Violet boxes and bold violet A residues: (NAN)x DnaA-trio motifs to which DnaA binds to ssDNA during unwinding according to Richardson et al. (2016). Figure by the author
A closer look at the Atribacter oriC sequences shows that key structural elements are present and, despite non-significant homology at the nucleotide level, are arranged similarly to E. coli oriC and B. subtilis oriC (Figure 2). The DNA-unwinding element (DUE) is a ~30-bp short and mostly AT‑rich sequence that reacts to increasing negative superhelicity by localized unwinding, that is, assuming a single-stranded conformation. This conformational switch has been shown experimentally for E. coli and B. subtilis upon DnaA binding to oriC, but can also be simulated with appropriate software.
The initiator protein DnaA binds via its C‑terminal domain 4 to double-stranded DNA at sequences resembling the non‑palindromic "DnaA box" TTWTNCACA (reverse complement: TGTGNAWAA), and the DnaA box marked "R1" in Figure 2 is located at a distance of 1–2 helical turns from the DUE in all known oriCs. We had found long ago that in E. coli tinkering with DnaA box R1 inevitably leads to inactivation of oriC. Located between the DUE and DnaA box R1 is the DnaA-trio motif (NAN)x to which oligomerized DnaA binds via its central domain 3 when it is single-stranded during unwinding as was recently shown for B. subtilis oriC by Heath Murray's lab at Newcastle University, UK. Both Atribacter oriCs fit extremely well into this scheme and an experimental confirmation of their origin function looks straightforward (but needs to be done).
The initiation phase of chromosome replication in E. coli culminates with the loading of the replicative helicase, DnaB, onto the single-stranded region in the DUE in oriC, by physical interaction of DnaA with DnaB. Briefly, DnaB then unwinds DNA and recruits the primase, DnaG, which in turn triggers the formation of a complete replisome. As far as initiation is concerned, A. laminatus as the first characterized member of the phylum Atribacterota does not appear to deviate from paths known for other bacteria. This does not rule out smaller and larger surprises in the further phases of chromosome replication. Just a small one here: the genome of A. laminatus encodes homologs of DnaB and DnaG, but not of the E. coli helicase loader, DnaC. Instead of DnaC, A. laminatus may use as helicase loader DciA (DUF721) as most bacteria do that do not have a dnaC gene. And most bacteria do not have a dnaC gene, only Enterobacteraceae (see here in STC).
Do you want to comment on this post? We would be happy about it! Please comment on Mastodon, Bluesky, or on 𝕏 (formerly Twitter).
*) If one cannot detect an oriC structure in the dnaA·dnaN intergenic region of a complete genome, one can make use of the GC-skew for its detection. OriC is commonly localized at a distance of ~10 kb from the minimum inflection point of the GC-skew in a ~100–500 bp-long intergenic region. For the complete genome of Atribacter laminatus RT761 [NZ_CP065383.1], I obtained a secondary GC-skew minimum at pos. 852932 using the GenSkew webserver, which accounts for a distance of 13.8 kb from the predicted position of oriC (pos. 839078..839268).
Comments