by Roberto
If you were to plot the frequency of all known open reading frames (ORFs) as a function of the number of codons they contain you'd see a broad distribution showing an average of about three hundred codons and a long tail of very long ORFs. Just for fun, what are the very end points of the graph? What is the longest ORF known? And the shortest?
On the matter of long ORFs, for a while the record belonged to the ORF for titin, a protein that functions as a molecular spring providing muscle with passive elasticity. The folded protein is over 1 ยตm long and the ORF that encodes it has 27,000-35,000 codons, depending on the splicing isoform. Gargantuan to say the least. But a recent discovery puts titin as a distant second. The new record holder for the largest ORF (and consequently the largest polypeptide chain) comes from a microbe! No, not a bacterium or an archaeon. Rather, a microscopic algae, the haptophyte Prymnesium parvum (Fig. 1). These algae are the cause of massive fish kills due to their production of giant marine polyether toxins known as prymnesins. Genomic sequence analyses reveal that there are two giant polyketide synthases (PKS) responsible for the synthesis of Prymnesin-1. The larger of the two is a polypeptide chain of 45,212 amino acid residues! No surprise the discoverers opted to name these gigantic PKSs PKZILLAS.
How about short ORFs? Perhaps not surprisingly, this question takes us into the RiPPs (ribosomally synthesized and post-translationally modified peptides). The ORFs for these generally contain the codons encoding a leader peptide that is cleaved off, so they are not particularly short. But there is one clear exception, Microcin C7. The seven-amino-acid peptide is derived from an eight-codon ORF (counting the termination codon); no leader peptide, no proteolytic processing, just N- and C-terminal modifications. For decades (since 1994), this remained "the smallest known gene." As difficult as it is to imagine, this record of smallness was recently broken. Investigators discovered a biaryl-linked tripeptide (derived from Tyr-Tyr-His) produced by a strain of the actinobacterial genus Planomonospora (see Fig. 2A). They named the tripeptide a "biarylitide" and, suspecting it to be a RiPP, they surveyed the genome of the producing bacterium for regions encoding such tripeptides adjacent to possible modifying enzymes. Much to their surprise, they found a six-codon ORF (including the termination codon) predicted to encode the pentapeptide Met-Arg-Tyr-Tyr-His (Fig. 2B). How's that for a short peptide? Two amino acids act as the leader and three get modified to yield that RiPP. Using a bioinformatic approach, the authors showed that this type of very short RiPP is common and widely distributed among the actinobacteria. One thing, as far as I can tell, no biological function has been assigned to the biarylitides.
Thus, the new record for short ORFs stands at six-codons. Unless, of course, you consider the two-codon unit found in the regulatory region of E. coli's ampC gene, the chromosomal b-lactamase, an ORF. Back in 1981, Normark and colleagues discovered that ampC achieved growth-rate regulation through transcription termination/antitermination. The key determinant for disrupting the structure of a transcription terminator was the presence of a bound ribosome. Where was that ribosome bound? At an AUG-UAA, a start codon followed by a stop codon. If you consider "AUGUAA" an ORF, well, you won't find anything shorter!
Do you want to comment on this post? We would be happy about it! Please comment on Mastodon, Bluesky, or on ๐ (formerly Twitter).
Comments