Methods and Statistics

The construction of the GFP collection was performed at Prof. Erin O’shea’s lab at UCSF and is described here:
Huh WK., Falvo JV., Gerke LC., Carroll AS., Howson RW., Weissman JS. & O’Shea EK. (2003) Global analysis of protein localization in budding yeast. Nature, 425(6959):686-91. Pdf.

A detailed description of the entire methodology and applied statistics used to create the LoQate database can be found here:

Breker M., Gymrek M. & Schuldiner M. (2013) A novel single-cell screening platform reveals proteome plasticity during yeast stress responses. Journal of Cell Biology, DOI: 10.1083/jcb.201301120. Pdf.

Supplementary data. Pdf.

Materials and methods

I Strain management

A. Insertion of cytosolic mCherry to the GFP library
Synthetic Genetic Array (SGA) technique was performed between a MATα haploid strain harboring TEF2pr-mCherry::URA3 integrated into the URA3 locus (plasmid for creation of the strain was a kind gift from David Breslow) (Breslow et al., 2008) against the GFP collection (::HIS3) (Huh et al., 2003). Mating was performed on rich media plates, selection for diploid cells was performed on plates lacking both HIS and URA. Sporulation was then induced by transferring cells to nitrogen starvation plates for 5 days. Haploid cells containing all desired mutations were selected for by transferring cells to plates containing all selection markers alongside the toxic amino acid derivatives Canavanine and Thialysine (Sigma-Aldrich) to select against remaining diploids and lacking Leucine to select for only spores with an “a” mating type (Cohen and Schuldiner, 2011; Tong et al., 2001). SGA procedure was validated by inspecting representative strains for the presence of the GFP tagged strains and for the cytosolic mCherry expression. In order to manipulate the collection in high-density format (384) we used a RoToR bench top colony arrayer (Singer Instruments, UK). Yeast strains used in this study are in Table SX.

B. Strain growth
The manipulated 5330 strains as described above were grown in 50μl synthetic (SD) medium (0.67% yeast nitrogen base without amino acids (Conda Pronadisa) and 2% dextrose) containing the appropriate supplements for selection in 384-well plates (Cat. Number 781162, Greiner).

C. Application of stress
Hydrogen peroxide (H2O2) treatment - cells were grown to early-log phase at which point H2O2 (Cat. Number 2186-01, J.T.Baker) was added at a final concentration of 1 mM. Following 60min incubation plates were taken to microscopy imaging.
Nitrogen starvation - cells were grown to early-log phase, centrifuged mildly (1000g) and fresh (S) synthetic medium (0.67% yeast nitrogen base without amino acids and without ammonium sulphate (Conda Pronadisa) and 2% dextrose) was added. Following 15hrs incubation plates were taken to microscopy imaging.
Dithiothreitol (DTT) treatment – cells grown over night were back diluted into 2mM DTT containing SD medium. Following 3hrs incubation plates were taken to microscopy imaging.

D. Plasmids and deletions
Deletion strains were prepared by replacement of the ORFs with a pCgMET15 cassette using homologous recombination with 40bp of homology (Kitada et al., 1995). pRS416 plasmid expressing MTS-dsRED under the ADH1 promoter was kindly provided by Jodi Nunnari (Meeusen and Nunnari, 2003). pRS426 plasmid expressing NLS-tdTomato under the GPD1 promoter was kindly provided by Daniel Kaganovich (Kaganovich et al., 2008). Plasmids used in this study are in Table SXI and primers used in this study are in Table SXII.

II Automated imaging

A. High-throughput fluorescence microscopy
Microscopic screening was performed using an automated microscopy set-up as previously described (Cohen and Schuldiner, 2011). Cells were moved from agar plates into liquid 384 well polystyrene growth plates using the RoTor arrayer. Liquid cultures were grown over night in SD medium in a shaking incubator (LiCONiC Instruments) at 30°C. A JANUS liquid handler (Perkin Elmer), which is connected to the incubator, was used to back-dilute the strains to approximately 0.25 O.D. into plates containing the same medium. Plates were then transferred back to the incubator and were allowed to grow for 3.5 hours at 30°C to reach logarithmic growth phase, as was validated in preliminary calibration. The liquid handler was then used to transfer strains into glass bottom 384-well microscope plates (Matrical Bioscience) coated with Concanavalin A (Sigma-Aldrich) to allow cell adhesion. Wells were washed twice in medium to remove floating cells and reach cell monolayer. Plates were then transferred into an automated inverted fluorescent microscopic ScanR system (Olympus) using a swap robotic arm (Hamilton). Imaging of plates was performed in 384 well format using a 60X air lens (NA= 0.9) in SD medium at 24ºC with a cooled CCD camera (Hamamatsu ORCA-ER). Images were acquired at GFP (excitation at 490/20nm, emission at 535/50nm) and mCherry (excitation at 572/35nm, emission at 632/60 nm) channels.

B. Image analysis
Our screening assay was designed to explore yeast cell biology by assessing two cellular key features of interest: sub-cellular localization and fluorescence intensity. In order to analyze these images we have utilized an in-house script to browse manually and assign localization rapidly and efficiently. To extract proteomic abundance from images, we utilized the Olympus ScanR analysis software. This allows for the pre-processing of images by background subtraction, and segmentation of images to identify individual cells as separate objects. Specifically we performed the following steps:
1. Segmentation on the basis of the edge module of the cytosolic mCherry protein expression.
2. Background correction using the rolling ball algorithm.
3. Definition of measured populations. Since several measurements are collected for each cell (e.g. fluorescence intensity, area, shape) we have created a multi-parameter gate to ensure our population was homogenous and that data arises from clearly defined cells only. The mean GFP intensity for each object (cell) of each strain was extracted to excel files allowing data processing (see below) for single cell resolution from within a given population.

III Data processing

A. Median measurement
The median GFP intensity for each strain was measured from the remaining objects for each strain under each condition after removing dead cells. Dead cells are highly fluorescent and must be removed from the analysis since they artificially raise the mean GFP intensity values. Because the fluorescence and shape of dead cells fall within range of those features for normal cells, the software could not gate them out. To automate the removal of dead cells detected as objects by the ScanR software, for each strain we removed any objects that were very high outliers in their mean GFP measurement. Therefore any objects with mean GFP measurement above: UQ+3*IQR were removed from analysis, where UQ = upper quartile and IQR = interquartile range. Overall, more than 94% of cells originally screened were classified as alive. On average, 5% of cells per strain were dead under stress. Since under reference conditions (SD) we performed two independent measurements we combined scores from both measurements to get one median and standard deviation value for each strain under reference conditions (summarized in Table SI).

B. Removing Strains From the Analysis
The following strains were removed from the analysis:

1. Strains in which less than 25 objects were detected.
2. Strains whose proper subcellular localization has been demonstrated to be dependent on the C-terminus of the protein. A full list of the systematically mis-localized proteins that were removed from analysis is given in Table SXIII.
3. Strains whose tagged ORFs are located near the CAN1, LYP1, or URA3 loci. Such strains could not pass the SGA required to make the Tef2-Cherry background. A full list is given in Table SXIII.
4. Contaminated strains. Any strains showing localization different than that annotated in SGD or that shown in the original Huh et al published dataset (Huh et al., 2003) were assumed to be the result of contamination and were removed.
5. Bud-neck proteins as the analysis program could not accurately detect their signal, sometimes out of the detected cell boundaries.

C. Detection of auto fluorescence
Yeast cells not tagged with GFP emit fluorescence at some baseline intensity. We have found this value to differ significantly across conditions. This may be a result of different cellular conditions under various stresses or of fluctuations in the light source. To account for this we screened 86 randomly placed wells containing a wild type strain with no GFP tag in each measured condition. After removing dead cells, the distribution of the intensity values for wild type objects was not rejected for normality using the Shapiro-Wilk test (p < 0.01). We then approximated the autofluorescence as a normal distribution with mean autofluorescence value Acondition and standard deviation σcondition. Any strain whose median GFP intensity under a given condition was more than Acondition + 2.58σcondition is > 99% likely to have biological GFP expression. Strains below this cutoff are marked as “below threshold”.
If a strain falls below this threshold, but localization other than “cytosol” or “ambiguous” was assigned, it is kept in the analysis because we could be certain that we visualized that protein.

D. Experimental controls
• Testing the reproducibility – in order to test the stability of our microscopy platform we plotted two independent measurements of all strains (5330) in synthetic medium (Pearson’s correlation coefficient on logarithmic scaled data tested r2=0.97, slope=0.98, p<0.01).
• Measurement of accuracy – in order to test accuracy of our measurements, we plotted the calculated medians by our method vs. 2 measurements of the same strains by flow cytometry (r2=0.79) (Newman et al., 2006) and with the measurements made by Chong et al (personal communications) (r2=0.77).
• Further testing – in order to show the relevance of this measurements to other proteomic data sets, we plotted the calculated medians by our method vs. 3 measurements of yeast proteomes: native, untagged proteins measured by mass spectrometry (r2=0.44) (Walther et al., 2010), western blotting on TAP tagged strains (r2=0.49) (Ghaemmaghami et al., 2003) and ribosomal foot-print values (r2=0.31) (Ingolia et al., 2009).

IV Determining abundance change events

For each condition, we determined which strains are significantly up- or down- regulated compared to their reference abundance levels.

A. Preprocessing
In addition to the corrections described above in Data processing, we removed any strains that changed localization under the condition of interest compared to reference medium. Since the GFP signal is sensitive to cellular conditions (such as ionic strength or pH), levels cannot be compared between two proteins showing different localizations.

Some proteins are only expressed under certain stress conditions and are not detected under reference conditions. In order to not miss these proteins, any strains that fell below the autofluorescence threshold in the reference condition and therefore were not assigned localization, but were detected under the stress condition, were left in the analysis. Similarly, strains detected under reference but were below the autofluorescence threshold under treatment were also included.

B. Normalization of fluorescence signal
In order to make fluorescence values comparable across reference and stress conditions, we normalized median fluorescence values for each strain using the normalize.quantiles function in the R preprocessCore library (Bolstad et al., 2003). This method adjusts fluorescence values such that the median values for each strain follow the same distribution under each condition. The fluorescence values of objects for each strain under each condition are scaled to have the corrected median value.

C. Definition of significant abundance change
All strains under synthetic medium in two independent experiments were sorted by their log10 abundance values and binned such that each bin contained 5% of strains. Per each bin we plotted the distance from the diagonal for each strain in the scatterplot of replicate 1 vs. replicate 2. The distances within each bin were normally distributed (Shapiro-Wilk test, p < 0.01). For each stress condition, we plotted the log10 abundance of each strain vs. the average of the log10 abundance across the two reference conditions. We then determined the distance of each strain from the diagonal. Based on the distribution of distances between the two synthetic media experiments in the corresponding abundance bin for each strain, we determined an empirical p-value. Strains with p<0.01 were marked as showing a significant abundance change (Figure S3) and these events are marked above (red) and below (blue) dashed lines in Figure S3.

D. Fold Change Values
For each stress condition, we calculated the ratio of median GFP intensity measured under stress to the median GFP intensity under reference conditions. Ratios are given in Table SI.

E. Testing Multimodality
In the case that not all cells are uniformly affected by a stress condition, the resulting distribution of fluorescence intensities could be multimodal. The significance test described above is not informative in this situation, and gives no indication that there might be multiple levels of response throughout a population of cells. To test each strain under each condition for multimodality, we used Hartigan's Dip Test (Hartigan and Hartigan, 1985), which tests how well the distribution of fluorescence for the objects fits the tightest fitting uni-modal distribution. Strains in which uni-modality was rejected with p<0.05 were marked as multimodal. This test was implemented using the dip.test command in the R diptest library. (The list of strains predicted to have multimodal distributions is given in Table SVI). Manual inspection confirmed that all strains rejecting uni-modality were indeed bimodal.

V Comparison to published data sets

A. Comparison to essentiality data
We used published essentiality datasets (Hillenmeyer et al., 2008) to determine which ORFs are essential under each condition tested. We downloaded the homozygous fitness defect scores from http://chemogenomics.stanford.edu/supplements/global/download.html and created a list of ORFs for each condition shown to be essential with p < 0.05. We used the conditions 5 mM H2O2 and synthetic medium for H2O2 and starvation, respectively.

B. Comparison to mRNA data
For each ORF under each condition tested, we determined whether it was up-regulated, down-regulated, or showed no change at both the protein and mRNA level. Proteins were determined to be up or down regulated based criteria described above in Definition of significant abundance change. To determine if an ORF is up or down regulated at the mRNA level we used published expression datasets. For DTT, we compared to the DTT time points at 15, 30, 60, and 120 minutes (Travers et al., 2000). For starvation and H2O2, we compared to the nitrogen depletion and 0.32mM H2O2 time points from 30 minutes to 1 day and 10 minutes to 160 minutes respectively (Gasch et al., 2000). An ORF was defined as up-regulated if any time point showed a 2-fold increase in expression level compared to reference and as down-regulated if any time point showed a 2-fold decrease compared to reference. ORFs who had time points showing both a 2-fold increase and 2-fold decrease were marked as ambiguous and were removed from the analysis.

We then used these up/down/no change classifications to find ORFs marked as no change in mRNA levels but that did significantly change at the protein level according to our analysis.

VI Protein-protein interactions analysis

A. Dihydrofolate reductase based protein-fragment complementation assay (DHFR PCA).
The assay for the yeast DHFR PCA followed entirely the published one (Tarassov et al., 2008). Briefly, MATa strains with the ORFs of Acc1 and Pin3 fused c-terminally to F[1,2] were mated to the entire MATα collection of ORFs tagged with F[3]. The resulting diploids were subsequently selected for growth in the presence of methotrexate for positive DHFR PCA reconstitution -/+ the addition of 3mM DTT for 5 days in 30°C.

B. Data acquisition, colony quantification, and statistical analysis.
Complete acquisition and analysis of each plate proceeded as follows: First, images of the diploid methotrexate selection -/+ DTT were taken after 120 hours of growth. Plate images were saved in JPG format at a resolution of 300 dpi. Utilizing the freely available Balony software (http://code.google.com/p/balony/downloads/list) the first step of the image analysis was to determine expected centers of the colonies arrayed in 48 columns and 32 rows. In order to adjust for variation in plating and possible rotation of an image during image acquisition we manually defined the coordinate center of a first colony in a first row of the array and of the last colony of the last row. Results of positional array adjustment and detection of colony centers were also validated manually for all images. We then extracted the area for each colony and set a threshold of a positive interaction to be above 150. This threshold was chosen accordingly: the negative interactions distribution concentrated normally with a mean=41, std=11.1, while positive interactions controls distributed with a mean=400, std=208. Therefore, each colony got a z-score according to its area and distance from the negative controls mean and a p-value was calculated following the multiple hypothesis correction.

Quick Search