by Mechas
Discovering microscopic organisms and elements that exist at the Edge of Sight is no easy task. New tools designed to explore this mysterious world reveal previously unknown elements and uncover a complex web of forms whose secrets remain to be fully unraveled.
The history of life is marked by the coevolution of organisms and their constant assailants, mobile genetic elements (MGEs) – semiautonomous replicators distinct from cellular life forms. Among these MGEs are viruses, the ubiquitous particles of packaged nucleic acids that infect animals, plants, and bacteria. Viruses are, of course, well known as the causative agents of many human diseases, such as Influenza, AIDS, SARS, Ebola, and COVID-19. Yet, these viruses are but the tip of the iceberg; estimates indicate that many viral species are yet to be identified. But their discovery remains a challenge. New bioinformatic tools and the availability of open databases containing many petabases of information (a petabase is 1015 nucleotides!) provide a wonderful platform for such discoveries.
In 2022, researchers at the University of Toronto in Canada developed a cloud computing infrastructure to run ultra-high-throughput sequence alignments. By searching 5.7 million sequence datasets, they identified 131,957 novel RNA viruses. Another study led by researchers at Ohio State University interrogated data from the Global Ocean RNA metatranscriptomes. They identified previously unknown RNA viruses that increase the known diversity and indicate the need to revise the proposed taxonomy. A similar analysis mined over 5,000 metatranscriptomes, uncovering >2.5 million RNA virus contigs and vastly expanding the known virosphere. These studies searched for conserved RNA-dependent RNA polymerases (RdRPs) used by these viruses for replication. A more recent approach broadened the number of viral species by using artificial intelligence. The designed tool – LucaProt –recognized viral RdRPs in metatranscriptomes by integrating sequence and predicted structural information.
But of course, genomic dark matter includes entities other than viruses. Viroids and viroid-like elements, considered minimal replicators, have small circular RNA genomes, 220–450 nucleotides, that, unlike viruses, do not code for proteins. Instead, they depend on host RNA polymerases to carry out rolling circle replication, aided by viroid-encoded ribozymes. A recent study focused on identifying circular RNAs, rolling circle intermediates, and ribozymes to identify viroids. By using their computational pipeline to mine transcriptomes and metatranscriptomes, the authors identified more than 11,000 viroid-like elements across various ecosystems, increasing their known diversity by about 5-fold.
A more recent work by Zheludev and collaborators at Stanford University devised a new strategy to tackle the problem of discovering viroids, which lack conserved RNA polymerases essential for replication. They used a homology-independent bioinformatics approach to mine RNA sequence data for desired characteristics, such as small and circular genomes. They named the identified RNAs, approximately 1,000 nucleotides long, "obelisks" due to their predicted rod-shaped secondary structures. These obelisks encode two proteins, "Oblin–1" and "Oblin–2" of 202 and 53 amino acids, respectively, with no homologs in reference databases. Further analyses, which involved the use of k-mers and ultra-high-throughput sequence alignment to search for Oblin–1 and Oblin–2 homologs, identified 29,959 distinct obelisks in diverse ecological niches, a subset of which also contained ribozymes. By focusing on simplified microbiomes, the authors found a bacterial host, Streptococcus sanguinis, that contained such an element. The existence of this host-viroid system verifies the results obtained by bioinformatics and allowed some initial experimental work. In sum, these novel obelisk elements share no homology with viruses or other previously known elements, they form a distinct phylogenetic group and are diverse and globally distributed across human and environmental microbiomes.
The discovery of new genetic elements raises many questions about their biology and their roles in the biosphere. What is perhaps most striking is their diversity and the likely existence of many more such elements. With a combination of creativity and the capacity to carry out data-intensive explorations of the information stored in open sequence repositories, the possibilities for discovery seem endless. Thus far, such efforts show that sequence databases hide valuable information waiting to be deciphered.
Do you want to comment on this post? We would be happy about it! Please comment on Mastodon or Bluesky.
Comments