Editors' Note: Never before in this blog's existence have we posted a point-by-point analysis of a research report. We are happy to begin a new tradition with Alan Derman's scholarly review of a major piece of work, the first bacterial "localisome" identifying the intracellular location of a large number of the proteins of Caulobacter. Of necessity, this is a longer article than most, but we suggest that you read it in its entirety, as it will acquaint you not only with the results of this study, but also with the issues involved in obtaining and deciphering the data.
by Alan Derman
Imagine trying to acquaint yourself with your favorite bacterium by learning the intracellular address of each and every one of its proteins. You can't see them in a light microscope, and you can't realistically do immunofluorescence; for that you'd need to purify each one and raise antibodies. So you have to modify them so that they can be seen. You have to tag them with a fluorescent tag such as green fluorescent protein (GFP), which means in the case of E. coli or B. subtilis, more than 8000 oligonucleotide primers, more than 4000 PCR amplifications, and as many clonings and transformations. And then when you have your more than 4000 or so strains, each one producing a distinct fluorescently tagged protein, you'll need to look at them all, one by one, under the fluorescence microscope, and record what you see. It's expensive, laborious, and time-consuming, and it's no wonder that it's been done for only three microorganisms. Two of these were yeast, the budding yeast S. cerevisiae, and the fission yeast S. pombe, which are more amenable to this kind of analysis than bacteria. Yeast cells are some 15 to 20 times larger in cross-sectional area than a conventional rod-shaped bacterium, and they conveniently contain discrete subcellular organelles to which proteins are localized. The third study did have E. coli for its subject. The proteins were tagged, but no systematic survey of their cellular locations was undertaken. The first bacterial "localisome" had yet to be constructed.
Caulobacter is a fitting first subject for an encyclopedic survey of protein locations within a bacterial cell. Its cells are of either of two distinct morphologies, both of which are asymmetric. The sessile stalked cell, attached to a surface via an adhesive structure at the tip of its polar stalk, divides to produce another stalked cell and a motile swarmer cell that has a polar flagellum and pili, but no stalk. The swarmer cell, incapable of DNA replication or cell division, differentiates into a stalked cell, shedding its flagellum and pili, and growing a stalk. The former swarmer cell, now a sessile stalked cell attached to a surface, divides to produce a stalked cell and a swarmer cell, thus completing the cell cycle. It might be expected then that at least two distinct subsets of proteins would be found at the Caulobacter cell poles, but prior to this undertaking, no proteins associated with the stalk pole had been identified.
With Caulobacter chosen as the subject, the challenge was to generate its localisome accurately and without inordinate investments of time and effort. To do this, the Gitai group took advantage of several innovations. The most important of these, one that they developed, enabled them to integrate the data from fluorescence microscopy images of hundreds of cells for each protein. But first they needed to generate their set of fluorescent fusion proteins, and for this, they went with a one-size-fits-all approach. They chose the red fluorescent protein mCherry over the conventional green fluorescent protein (GFP) for its ability to fold in both aqueous compartments and membranes. Better to incorporate the tag at the N-terminus or at the C-terminus? Because tagging at one end or the other could compromise protein function, best to make two versions for each protein, one tagged at each end. This strategy turned out to be not merely prudent but necessary; about one-third of the proteins that they found to be localized would have been missed if they had relied on only N- or only C-terminal fusion proteins. Although there was no way to get around the requirement for 3763 PCR amplification reactions, they were able to carry out this ambitious program without making use of the classical cutting and pasting tools of molecular biology. Without using either restriction endonucleases or ligases, they instead built upon commercially available technology and managed to engineer their library entirely in vivo using phage lambda recombination machinery. In their library, 2786 of the 3763 Caulobacter crescentus ORFs (74.0%) were represented by both N- and C-terminal fluorescently-tagged fusion proteins. The authors note that this constitutes library version 1.0. An improved, more complete, version is presumably in the works.
How best to express these tagged proteins? Because it would be impossible to recapitulate the native expression profile for each protein, each gene fusion was integrated as a single copy into the Caulobacter chromosome within the xylose utilization operon; hence its expression could be regulated by xylose. This is again a one-size-fits-all solution, but one with some flexibility: expression levels could, at least in theory, be tuned for each protein. A single chromosomal copy of the gene fusion, although artificially regulated, is still a better simulation of the native situation than a plasmid-borne one. And because the native copy of the gene on the chromosome is unaltered, each strain still possesses one gene encoding a fully functional protein even if the tagging alters function of the fusion product.
Efficient image collection demanded automation that could not be provided by available commercial automated epifluorescence microscopes, which, because they do not use oil-immersion objectives, fall short of providing the required resolution. The Gitai group therefore resorted to some in-house engineering to fit their monumental imaging task to their conventional oil-immersion epifluorescence microscope equipped with a robotic stage. Instead of imaging one or maybe two strains on each slide, they figured out how to image 48 at a time. They set up slides with 48 agar pad mesas with surrounding canyons to prevent cross contamination, thus cutting the number of slides they needed to view from a formidable 5600 to a manageable 120. And they are looking at live, growing cells. They supplemented direct observation with an automated data analysis software suite that they had recently developed in conjunction with Jonathan Dworkin's group at Columbia University. This software cleanly and precisely delineates the bacterial cell boundaries in a microscopic field and imposes a uniform coordinate system that enables subcellular positions to be compared and correlated among cells. The coordinate system in each cell is elaborated from the positions of the cell poles and midline, whose locations. in turn, are established by iterative refinement that maximizes the length of the midline.
And so where are those 2786 proteins? Most are either distributed uniformly in the cytoplasm or at the cell periphery. These were not considered further in the study. Only 289 proteins, or a little over 10%, were localized in discrete patterns (unipolar, bipolar, as a central focus, in a line, etc.), or in combinations of these patterns. The Gitai group used several metrics to assess the reliability of their data. Of the 29 Caulobacter proteins that had been localized in previous studies, their own results agreed for 23. Where there was disagreement, the tendency was for this study to find no discrete localization pattern at all, suggesting that while they may have overlooked some localized proteins, it is unlikely that they attributed discrete localizations to proteins that are not actually localized. And when the localization of each N-terminally tagged protein was compared with that of its C-terminally tagged counterpart, the news was also good. Of the 63 unique proteins for which both the N- and C-terminal fusion proteins showed discrete localization, all but five pairs showed the same pattern. So the data do look reliable, and it's not at all a stretch to claim that the Gitai group has increased the number of localized proteins in Caulobacter by a factor of ten. Images of all the cells that contain localized proteins may be viewed at the Gitai laboratory website.
A proteome-scale analyses such as this affords the opportunity to draw conclusions about the behavior, or in this case, the localization profile, of entire classes of proteins. The Gitai group used the classifications established by the Gene Ontology Database and sorted the proteins of Caulobacter into 21 functional categories. They then looked at their localization data in this context. Some trends were unsurprising. Proteins that function in metabolism and small molecule transport tended not to be localized, whereas those that function in cell motility and cell division did. But they were a little surprised to discover that signal transduction, secretion, and cell wall and membrane biogenesis proteins were also localized. This kind of observation underscores the importance of this project. The observation is actually not new; individual proteins in these categories had been shown to be localized. But there is always the possibility that any particular protein is exceptional or that the finding was simply incorrect. But once many or most of the proteins in a category are shown to be localized, as in this study, what had been a curiosity for a single protein becomes an integral feature of the entire category and a key to their function.
The study concludes with a statistical overview of the entire localization data set generated by the automated data analysis software suite. This is a useful digest of the data, although few surprises emerge. In the tens of thousands of individual cells exhibiting localization (between 50 and 200 cells were surveyed per strain), there is a notable enrichment at the pole. Enrichment for localized proteins also turned up in the 30-40% zone, corresponding to the stalk/swarmer cell division zone. There's also a cold spot, at 5-25% of cell length, where relatively few proteins were localized. This could mean that this region is for some reason avoided or just that the cells have no very good reason for localizing proteins there — it's neither a pole or a division zone. The analysis also showed that the longer the cell, the more likely it is to contain localized proteins. It follows that protein localization becomes important later in the cell cycle, and this presumably reflects the prominence of cell division proteins among those that are localized. For each position, reproducibility is best at the poles and at the 30-40% position. Localization is also tightest at the poles, localization patterns becoming more diffuse with increasing distance from the poles.
So Caulobacter is most assiduous at placing proteins at the poles and at future division zones. This is precisely what one would expect for a bacterium that divides asymmetrically and constructs distinct structures at its poles. That there are few surprises is a pretty good indicator that this study was designed well and carried out carefully, an impressive feat given its scope and the ever-present temptation to sacrifice accuracy for speed when one is dealing with data acquisition of this magnitude. This, the first study of its kind in any bacterium, was not conducted by an institute or a consortium, but by a single academic laboratory — an impressive achievement.
A fundamental finding of this study is that 90% of the proteins of Caulobacter are, in fact, not localized. This raises the question of what it means for a protein to be localized. Given their interest in the Caulobacter cell cycle and developmental program, the Gitai group was principally concerned with proteins of the cytoplasm. They therefore excluded from their study those proteins that were uniformly distributed at the cell periphery. These are the proteins of the inner and outer membranes, of the cell wall, and of the periplasm. All of these proteins are, in the conventional understanding of the term, "localized" to these subcellular compartments. And this is not an inconsiderable number of proteins. In E. coli for example, 20% of cell proteins are found in the periplasm alone. One should therefore not conclude from this study that only a small minority of proteins contains information that specifies their location in the cell. Strictly speaking, the Gitai study is, for the most part, one of intracompartmental protein localization, with the compartment of interest being the cytoplasm.
The study revealed that Caulobacter cells contain different localization patterns with different localization stringencies. Localization could be very tight and reproducible from cell to cell or much more diffuse and not all that reproducible. At some point this dissolves into no localization at all. Where is that point? Gitai's group had to define it. They had to devise some operational definition of what it means to be a cell with localized proteins. Their approach was to calculate the mean fluorescence for each cell and to determine whether the maximum fluorescence in the cell was in excess of some threshold. How was this threshold set? By manual inspection — it was a judgment call. How good was the call? I noted above that comparison of their findings to those from previous studies indicated that that they had probably not attributed discrete localizations to proteins that actually are not localized. So it was a careful call.
Was it too careful? Was the threshold set too high? It's hard to say, and this goes back to one of those one-size-fits all parameters of their experimental design: the level at which the proteins are produced in the cell. Each fusion protein was placed under the control of the xylose promoter and "induced with xylose for a period that is long enough for robust expression but brief enough to minimize toxicity effects." Since the optimum induction conditions could not have been determined for each of the 5572 gene fusions — all were in fact induced for two hours — it is not likely that many were actually produced at their true physiological levels. Production at or fairly close to the physiological level is probably a requirement for the proper localization of at least some proteins. One can imagine, for example, that underproduction of a protein could have prevented its being recognized as localized because it is outcompeted by the native protein for anchoring sites or the like. For some proteins a similar problem could arise even if the true physiological level were achieved. If, for example, Caulobacter normally produces only 50 copies of a localized protein, it is doubtful that its localization would have been detected given the threshold set. This or related scenarios might explain why some components of the flagella are localized as they should be, but at least as many show no localization at all. There also exist some very persnickety proteins whose behavior and localization are exquisitely sensitive to their expression levels; dynamic cytoskeletal proteins come to mind. It is therefore reassuring to find that FtsZ, the bacterial tubulin relative and the principal component of the medial cytokinetic ring, is properly localized at "midband," and that the cytoskeletal cell-shape determining protein MreB is found in the same "patchy/spotty" distribution that has been reported previously. In short, then, not everything is where it should be — this would be impossible as at least a handful of those proteins "localized" to the poles just have to be inclusion bodies — but many things are, even some that one might not expect.
The Gitai study is a great first effort at determining the address of every protein in Caulobacter crescentus. There will be more complete versions of the mCherry fusion library constructed and examined in the future. The entire survey could be repeated with synchronized cells or with different induction conditions. Again, this is a prodigious amount of work, but the strains are all there and the technology for making the mesa and canyon agar pad slides is in place. There's also plenty to do with the 289 proteins that were found to be localized, beginning with the 58 that were localized identically as both N- and C- terminal m-Cherry fusions. Some of these proteins are likely to be dynamic, and this would have escaped detection in the initial effort. Time-lapse experiments are in order. And finally there are the sought-after favorites. There is, for example, protein CC1953, which as both an N- and C-terminal fusion localizes to the stalk — a first in Caulobacter.
We can still claim only an acquaintance with the proteins of Caulobacter, but thanks to the Gitai study, it's just become a little more intimate. We are beginning to know where they live. And still much more remains to be done.
Alan Derman is a Project Scientist in Joe Pogliano’s lab at the University of California at San Diego .