As you may know, STC has been on Twitter since August 2018. Mostly, we use this platform to announce our Mondays and Thursday posts. Occasionally, we comment on tweets that we find interesting, sometimes directly or by re-tweeting them. Twitter being Twitter, not everything we re-tweet is to be taken completely seriously (here is an example.)
Yes, we read our timeline quite regularly, and a tweet fluttered into our timeline the other day that I want you to take a look at now. I've reproduced it in Figure 1.
Visible in this tweet is a string of characters A, C, G, and T without any interpunction (261 of them, well within Twitter's 280 characters limit.) Not unexpectedly, 'Google translate' failed completely to make any sense of this character gibberish (you can try!). But most readers rightly assumed that it could only be a DNA sequence and suggested "translating" it from bases to amino acids by using one of the available online tools, for example from expasy.org. This is what you get then, see Figure 2.
Frame 1 is revealing. If you know that the 1-letter-code for amino acids doesn't know an "O," or that "X" stands for 'any amino acid', you can read it, and it says repeatedly: "NoTHING-HERE-EKCEPT-APRIL-FooLS," with stop codons as word separators (by the way, K for X is poor design, by taking NNN instead of AAG in the 14th codon they could have emulated the X). And then you see the date of the tweet 'Apr 1, 2022' and... case solved (Figure 1). Except that it isn't.
Frame 2 gives a protein of 79 residues, not too small (calculated molecular weight 8.4 kDa.) What to make of it? Easy question for a molecular biologist, and many among the respondents to the tweet right away suggested this too: BLAST that thing! Meaning a homology search through the entire NCBI protein database by the BLAST algorithm (it's been routine among molecular biologists and bioinformaticians for years, decades rather, so that we have long since made it a verb: "to blast a sequence..."). It took BLAST a few seconds, literally, to tell me that the 79-residue protein matches with an "extensin-like protein" of Procambarus clarkii. The match is not too bad with 35% identical amino acids across the entire length despite numerous gaps (Figure 3.) This "extensin-like protein" has a length of 366 amino acids, so the partial homology is clearly a funny coincidence. Extensins are hydroxyproline-rich glycoproteins (HRGPs) of the plant cell wall ─ the humble thale cress Arabidopsis thaliana has about 20 extensin genes ─ and according to Mishler-Elmore et al. (2021) "The extensin (EXT) network is elaborated by the covalent intermolecular crosslinking of EXT glycoprotein monomers, and its proper assembly is important for numerous aspects of basic wall architecture and cellular defense."
However, the BLAST hit "extensin-like protein" from Procambarus clarkii is everything but a plant protein, as this species belongs to the Crustaceans and is commonly known as red swamp crayfish (Figure 4). Maybe Wikipedia missed something in its "extensin" entry? But it is also definitely not a sample contamination mistaken for a crab protein because another BLAST search, this time with this "extensin-like protein" as query gives numerous hits among aquatic species from fish to frogs, even dinoflagellates, and among Chytrid fungi. Not a single hit for Bacteria and Archaea though, which gives me a good reason to leave the April fools herewith, at least until next year.