by Christoph
Just to quote myself here: When microbiologists describe a previously unknown bacterium, they are mainly interested in its lifestyle, metabolism, morphology and cell cycle. And to continue with the self-quotation: For me, "getting to know" newbies always involves finding out how they initiate the replication of their chromosome(s). *)
The four Pendulisporaceae described by Garcia et al. (2024) and introduced to STC in part 1 have unusually large genomes. Their single, circular chromosomes have lengths ranging from 12.2 to 13.6 mega-bases (Mb). One of their cousins in the taxonomic sub‑order Sorangiineae, Sorangium cellulosum So ce56, had for years been the bacterium with the largest known genome, 13.0 Mb, and it was found only recently that not only this branch of the myxobacteria (phylum Myxococcota), but also the phylum Chloroflexota contains species with very large genomes, for example Ktedonobacter racemifer SOSP1-21 (13.7 Mb). To put these genome sizes/chromosome lengths into perspective: the genome of the eukaryote Saccharomyces cerevisiae S288C has a size of 12.1 Mb, distributed over 16 chromosomes.
"Replicationists" assume this, but simply do not know whether bacteria with unusually large genomes initiate the replication of their chromosome like bacteria with "normal-sized" chromosomes – think E. coli MG1655 (4.6 Mb) or B. subtilis 168 (4.2 Mb) – dependent on the initiator protein DnaA from an origin of replication, oriC, or whether they use unknown mechanism(s). When Schneiker et al. (2007) published the genome sequence of Sorangium cellulosum So ce56 (a paper with 59 authors!), they had found a dnaA gene but oriC remained elusive. They state: "In the absence of the GC-skew inversion typically seen at the replication origin of bacterial chromosomes, it was not possible to discern the location of oriC."
In the complete genomic sequences, single dnaA genes are annotated for P. albinea MSr11954, P. brunnea MSr12523, P. rubella MSr11368 and P. rubella MSr11367. The protein sequences share ~85% identity, indicating their close relationship on the (GTDB taxonomic) genus level. From the same (taxonomic) family, the DnaA protein of Sorangium cellulosum So ce56 shares ~63% identity with them. In contrast, the DnaA proteins of E. coli K‑12 MG1655 and B. subtilis 168 share only ~40% and ~50% identity with DnaA of P. albinea MSr11954, respectively, which is low but in the typical range for DnaA proteins of species belonging to different phyla.
Given the presence of a dnaA gene, it should therefore be possible to find the origin of replication, oriC, and to predict its location in the four Pendulisporaceae genomes. (I should add that molecular biologists do not regard predictions as proof, but insist on experimental verification.) I started my search with the "informed guess" that oriC may be located in an intergenic region adjacent to the dnaN gene as in many bacterial genomes (mostly in the dnaA·dnaN intergenic region). The dnaN-adjacent intergenic regions have lengths between 448 bp and 510 bp and are thus large enough to accommodate oriC. Since I had already made the experience in the past that it makes sense to include the genomes of closely related species in such an analysis, I searched the dnaN-adjacent intergenic regions of another 10 genomes, including Sorangium cellulosum So ce56, for putative oriC sequences (these intergenic regions varied considerably in length, between 280 bp and 1211 bp).
I detected in all 14 sequences the conserved structural elements of bacterial oriC·s: counter-clockwise transcription of the left-flanking gene (<dnaN), a DNA-unwinding element (DUE), DnaA-trio motifs, a single or twin DnaA box in reverse orientation (R1) at a distance of ~2–3 helical turns from the DUE, and additional multiple DnaA binding sites arranged as arrays (Figure 2.1). The twin DnaA boxes R1 in the oriC·s of the Myxococcales are likewiese found in the oriC·s of Actinomycetota, Bacillota, and Cyanobacteriota (yes, I'm still struggling with the new names but that is slowly wearing off). I did not detect any structural elements in a first superficial screening that would make these origins somehow "special". Also, I could not detect possible "secondary origins" with typical oriC-features in these genomes.
Thus, as far as it is reasonably possible to draw conclusions from predictions, it can be argued that bacteria with circular singular chromosomes of >12 Mb in length initiate replication like bacteria with "normal-sized" chromosomes. Or, to look at the other end of the length spectrum: ...like bacteria with mini genomes such as the Stammera capleta symbiont of Cassida rubiginosa beetles, whose chromosome is only 0.27 Mb long. It is fine to know this, and that the mechanism of DnaA-dependent initiation of chromosome replication from oriC has apparently no restrictions on chromosome length, but it is certainly not important enough to warrant a proper scientific publication.
There's something else that's also worth a side note. As you see in Figure 2.2 (please use the "click to enlarge" option to read it), all members of the order Polyangiales (NCBI taxonomy) with one exception have oriC (red dot) adjacent to a syntenic gene triplet dnaN·recF·gyrB. I has repeatedly been argued that the strong conservation of genes in the oriC region reflects the fine‑tuning of their transcriptional regulation, as these genes have twice the copy number in replicating cells. In fact, the arrangement of genes (synteny) in the oriC region in genomes from more than half of the known bacterial phyla is almost or completely identical to that shown here for Geomonas oryzisoli and, in most cases, oriC is localized in the dnaA·dnaN intergenic region (Figure 2.2). However, just as many cases are known in which this conspicuous synteny is partially or completely mixed up by intra‑genomic recombination ("rearrangements"). A good example of this is the disruption of the syntenic oriC gene cluster in the Polyangiales, where only the oriC-adjacent gene triplet dnaN·recF·gyrB is preserved, while the mnmE·yidCD·rnpA·rpmH gene cluster is at a completely different genomic location, and dnaA, rsmG and gyrA are isolated genes in completely different contexts. Since the species of Polyangiales are certainly several hundred million years old, it is hard for me to imagine that they would not have used this long time to adapt the transcription of the genes close to the "new" oriC position in such a way that the double copy number during replication could be an impediment.
A final thought. Why could Schneiker et al. (2007) not pinpoint oriC when they sequenced Sorangium cellulosum So ce56 while I had no issues? For this genome, I obtained a clean inflection point in the cumulated GC‑skew curve at pos. 11,364,776, at a distance of approximately 9,2 kb from oriC at pos. 11,355,356 (c11355106..c11355605 [NC_010162.1]), which is reasonable since inflection points are not always directly at oriC but often at a distance of several kb. I suspect that they were "trapped" in the then prevailing ideas of the location of oriC in bacterial genomes: Ogasawara & Yoshikawa(1992) had convincingly argued that the (evolutionary) original localization of oriC is upstream of the dnaA gene as in B. subtilis (we now know that unwinding occurs in the dnaA·dnaN intergenic region), and Worning et al. (2006) had shown that, as a rule, oriC adjacent to dnaA is located near the inflection point of the GC skew. Since the dnaA-upstream region contains only 1 consensus and 5 degenerate DnaA boxes they probably thought it unlikely that this is the oriC, and the lack of detectable GC‑skew inflection point near dnaA convinced them in this. Additional criteria for the detection of oriC as I applied here were not available to them back in 2007.
Do you want to comment on this post? We would be happy about it! Please comment on Mastodon, Bluesky, or on 𝕏 (formerly Twitter).
*) see here in STC for the Lilliputians (formerly CPR bacteria, now Patescibacteria), here for Planctomycetota, here for Borrelia, and recently for the Atribacteria.
Comments