(2022) Molecular Systems Biology. 18, 5, 10726. Abstract
Twenty years ago, molecular biology transitioned from predominantly studying genes as isolated elements to viewing them as part of complex modules, giving rise to the field of systems biology. This transition was made possible by technological advances that allowed to simultaneously measure the expression levels of thousands of genes in a single experiment and drove a shift toward analyses identifying gene sets, modules, and pathways involved in a biological process of interest. Today we are excitingly facing a similar turning point in cell biology, where single-cell technologies have enabled us to approach cells as cellular modules.
(2022) Nature Immunology. 23, 5, p. 814- Abstract
Correction to: Nature Immunology https://doi.org/10.1038/s41590-021-01121-x, published online 20 January 2022.
In the version of the article originally published, the link provided in the Code availability section was invalid and has now been replaced in the HTML and PDF versions of the article with the following: https://github.com/angelolab/publications/tree/master/2022-McCaffrey_eta....
(2022) Nature Immunology. 23, 2, p. 318-329 Abstract
Tuberculosis (TB) in humans is characterized by formation of immune-rich granulomas in infected tissues, the architecture and composition of which are thought to affect disease outcome. However, our understanding of the spatial relationships that control human granulomas is limited. Here, we used multiplexed ion beam imaging by time of flight (MIBI-TOF) to image 37 proteins in tissues from patients with active TB. We constructed a comprehensive atlas that maps 19 cell subsets across 8 spatial microenvironments. This atlas shows an IFN-γ-depleted microenvironment enriched for TGF-β, regulatory T cells and IDO1+ PD-L1+ myeloid cells. In a further transcriptomic meta-analysis of peripheral blood from patients with TB, immunoregulatory trends mirror those identified by granuloma imaging. Notably, PD-L1 expression is associated with progression to active TB and treatment response. These data indicate that in TB granulomas, there are local spatially coordinated immunoregulatory programs with systemic manifestations that define active TB.
Transition to invasive breast cancer is associated with progressive changes in the structure and composition of tumor stroma(2022) Cell. 185, 2, p. 299-310 Abstract
Ductal carcinoma in situ (DCIS) is a pre-invasive lesion that is thought to be a precursor to invasive breast cancer (IBC). To understand the changes in the tumor microenvironment (TME) accompanying transition to IBC, we used multiplexed ion beam imaging by time of flight (MIBI-TOF) and a 37-plex antibody staining panel to interrogate 79 clinically annotated surgical resections using machine learning tools for cell segmentation, pixel-based clustering, and object morphometrics. Comparison of normal breast with patient matched DCIS and IBC revealed coordinated transitions between four TME states that were delineated based on the location and function of myoepithelium, fibroblasts, and immune cells. Surprisingly, myoepithelial disruption was more advanced in DCIS patients that did not develop IBC, suggesting this process could be protective against recurrence. Taken together, this HTAN Breast PreCancer Atlas study offers insight into drivers of IBC relapse and emphasizes the importance of the TME in regulating these processes.
Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning(2021) Nature biotechnology. 40, 4, p. 555-562 Abstract
A principal challenge in the analysis of tissue imaging data is cell segmentation—the task of identifying the precise boundary of every cell in an image. To address this problem we constructed TissueNet, a dataset for training segmentation models that contains more than 1 million manually labeled cells, an order of magnitude more than all previously published segmentation training datasets. We used TissueNet to train Mesmer, a deep-learning-enabled segmentation algorithm. We demonstrated that Mesmer is more accurate than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We then adapted Mesmer to harness cell lineage information in highly multiplexed datasets and used this enhanced version to quantify cell morphology changes during human gestation. All code, data and models are released as a community resource.
(2021) Single-Cell Protein Analysis. p. 147-156 Abstract
Multiplexed Ion Beam Imaging by Time of Flight (MIBI-TOF) enables high-dimensional imaging in situ of clinical specimens at single-cell resolution. In MIBI-TOF, tissue sections are stained with dozens of metal-labeled antibodies, whose abundance and location are read by secondary ionization mass spectrometry. The result is a multi-dimensional image, depicting sub-cellular expression and localization for dozens of distinct proteins in situ. Here, we describe the staining and imaging procedures of a MIBI-TOF experiment.
Multiplexed Imaging Analysis of the Tumor-Immune Microenvironment Reveals Predictors of Outcome in Triple-Negative Breast Cancer(2021) Communications Biology. 4, 852. Abstract
Triple-negative breast cancer (TNBC), the poorest-prognosis breast cancer subtype, lacks clinically approved biomarkers for patient risk stratification, treatment management, and immunotherapies. Prior literature has shown that interrogation of the tumor-immune microenvironment (TIME) may be a promising approach for the discovery of novel biomarkers that can fill these gaps. Recent developments in high-dimensional tissue imaging technology, such as multiplexed ion beam imaging (MIBI), provide spatial context to protein expression in the TIME, opening doors for in-depth characterization of cellular processes. We developed a computational pipeline for the robust examination of the TIME using MIBI. We discover that profiling the functional proteins involved in cell-to-cell interactions in the TIME predicts recurrence and overall survival in TNBC. The interactions between CD45RO and Beta Catenin and CD45RO and HLA-DR were the most relevant for patient stratification. We demonstrated the clinical relevance of the immunoregulatory proteins PD-1, PD-L1, IDO, and Lag3 by tying their interactions to recurrence and survival. Multivariate analysis revealed that our methods provide additional prognostic information compared to clinical variables. Our novel computational pipeline produces interpretable results, and is generalizable to other cancer types.
(2021) Frontiers in Immunology. 12, 652631. Abstract
Multiplex imaging technologies are now routinely capable of measuring more than 40 antibody-labeled parameters in single cells. However, lateral spillage of signals in densely packed tissues presents an obstacle to the assignment of high-dimensional spatial features to individual cells for accurate cell-type annotation. We devised a method to correct for lateral spillage of cell surface markers between adjacent cells termed REinforcement Dynamic Spillover EliminAtion (REDSEA). The use of REDSEA decreased contaminating signals from neighboring cells. It improved the recovery of marker signals across both isotopic (i.e., Multiplexed Ion Beam Imaging) and immunofluorescent (i.e., Cyclic Immunofluorescence) multiplexed images resulting in a marked improvement in cell-type classification.
(2021) PLoS Computational Biology. 17, 4, e1008887. Abstract
Mass Based Imaging (MBI) technologies such as Multiplexed Ion Beam Imaging by time of flight (MIBI-TOF) and Imaging Mass Cytometry (IMC) allow for the simultaneous measurement of the expression levels of 40 or more proteins in biological tissue, providing insight into cellular phenotypes and organization in situ. Imaging artifacts, resulting from the sample, assay or instrumentation complicate downstream analyses and require correction by domain experts. Here, we present MBI Analysis User Interface (MAUI), a series of graphical user interfaces that facilitate this data pre-processing, including the removal of channel crosstalk, noise and antibody aggregates. Our software streamlines these steps and accelerates processing by enabling real-time and interactive parameter tuning across multiple images.
Multiplexed imaging of human tuberculosis granulomas uncovers immunoregulatory features conserved across tissue and blood(2020) BioRxiv. Abstract
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis that is distinctly characterized by granuloma formation within infected tissues. Granulomas are dynamic and organized immune cell aggregates that limit dissemination, but can also hinder bacterial clearance. Consequently, outcome in TB is influenced by how granuloma structure and composition shift the balance between these two functions. To date, our understanding of what factors drive granuloma function in humans is limited. With this in mind, we used Multiplexed Ion Beam Imaging by Time-of-Flight (MIBI-TOF) to profile 37 proteins in tissues from thirteen patients with active TB disease from the U.S. and South Africa. With this dataset, we constructed a comprehensive tissue atlas where the lineage, functional state, and spatial distribution of 19 unique cell subsets were mapped onto eight phenotypically-distinct granuloma microenvironments. This work revealed an immunosuppressed microenvironment specific to TB granulomas with spatially coordinated co-expression of IDO1 and PD-L1 by myeloid cells and proliferating regulatory T cells. Interestingly, this microenvironment lacked markers consistent with T-cell activation, supporting a myeloid-mediated mechanism of immune suppression. We observed similar trends in gene expression of immunoregulatory proteins in a confirmatory transcriptomic analysis of peripheral blood collected from over 1500 individuals with latent or active TB infection and healthy controls across 29 cohorts spanning 14 countries. Notably, PD-L1 gene expression was found to correlate with TB progression and treatment response, supporting its potential use as a blood-based biomarker. Taken together, this study serves as a framework for leveraging independent cohorts and complementary methodologies to understand how local and systemic immune responses are linked in human health and disease.
(2020) Journal of Computational Biology. 8, p. 1204-1218 Abstract
Recent in situ multiplexed profiling techniques provide insight into microenvironment formation, maintenance, and transformation through a lens of localized cellular phenotype distribution. In this article, we introduce a method for recovering signatures of microenvironments from such data. We use topic models to identify characteristic cell types overrepresented in neighborhoods that serve as proxies for microenvironment. Furthermore, by assuming spatial coherence among neighboring microenvironments our model limits the number of parameters that need to be learned and permits data-driven decisions about the size of cellular neighborhoods. We apply this method to uncover anatomically known structures in mouse spleen-identifying distinct population of spleen B cells that are defined by their characteristic neighborhoods. Next, we apply the method to a dataset of triple-negative breast cancer tumors from 41 patients to study the structure of tumor-immune boundary. We uncover previously reported tumor-immune microenvironment near the tumor-immune boundary enriched for immune cells with high Indoleamine-pyrrole 2,3-dioxygenase (IDO) and Programmed death ligand 1 (PD-L1) and a novel, immunosuppressed, microenvironment-enriched for cells expressing CD45 and FoxP3.
(2020) Nature Reviews Cancer. 1, 2, p. 156-157 Abstract
Tumor heterogeneity remains an obstacle to effective clinical management of breast cancer. Two new studies use high-dimensional imaging of single-cell protein expression in situ in clinical samples to link genomic alterations to multi-cellular features of the tumor microenvironment and reveal breast-cancer phenotypes associated with clinical outcome.
(2019) Science Advances. 5, 10, 5851. Abstract
Understanding tissue structure and function requires tools that quantify the expression of multiple proteins while preserving spatial information. Here, we describe MIBI-TOF (multiplexed ion beam imaging by time of flight), an instrument that uses bright ion sources and orthogonal time-of-flight mass spectrometry to image metal-tagged antibodies at subcellular resolution in clinical tissue sections. We demonstrate quantitative, full periodic table coverage across a five-log dynamic range, imaging 36 labeled antibodies simultaneously with histochemical stains and endogenous elements. We image fields of view up to 800 mu m x 800 mu m at resolutions down to 260 nm with sensitivities approaching single-molecule detection. We leverage these properties to interrogate intrapatient heterogeneity in tumor organization in triple-negative breast cancer, revealing regional variability in tumor cell phenotypes in contrast to a structured immune response. Given its versatility and sample back-compatibility, MIBI-TOF is positioned to leverage existing annotated, archival tissue cohorts to explore emerging questions in cancer, immunology, and neurobiology.
(2019) Nature Communications. 10, 68. Abstract
Steady-state protein abundance is set by four rates: transcription, translation, mRNA decay and protein decay. A given protein abundance can be obtained from infinitely many combinations of these rates. This raises the question of whether the natural rates for each gene result from historical accidents, or are there rules that give certain combinations a selective advantage? We address this question using high-throughput measurements in rapidly growing cells from diverse organisms to find that about half of the rate combinations do not exist: genes that combine high transcription with low translation are strongly depleted. This depletion is due to a trade-off between precision and economy: high transcription decreases stochastic fluctuations but increases transcription costs. Our theory quantitatively explains which rate combinations are missing, and predicts the curvature of the fitness function for each gene. It may guide the design of gene circuits with desired expression levels and noise.
A Structured Tumor-Immune Microenvironment in Triple Negative Breast Cancer Revealed by Multiplexed Ion Beam Imaging(2018) Cell. 174, 6, p. 1373-1387 Abstract
The immune system is critical in modulating cancer progression, but knowledge of immune composition, phenotype, and interactions with tumor is limited. We used multiplexed ion beam imaging by time-of-flight (MIBI-TOF) to simultaneously quantify in situ expression of 36 proteins covering identity, function, and immune regulation at sub-cellular resolution in 41 triple-negative breast cancer patients. Multi-step processing, including deep-learning-based segmentation, revealed variability in the composition of tumor-immune populations across individuals, reconciled by overall immune infiltration and enriched co-occurrence of immune subpopulations and checkpoint expression. Spatial enrichment analysis showed immune mixed and compartmentalized tumors, coinciding with expression of PD1, PD-L1, and IDO in a cell-type- and location-specific manner. Ordered immune structures along the tumor-immune border were associated with compartmentalization and linked to survival. These data demonstrate organization in the tumor-immune microenvironment that is structured in cellular composition, spatial arrangement, and regulatory-protein expression and provide a framework to apply multiplexed imaging to immune oncology.
(2016) Cell. 166, 5, p. 1282-1294.e18 Abstract
Data of gene expression levels across individuals, cell types, and disease states is expanding, yet our understanding of how expression levels impact phenotype is limited. Here, we present a massively parallel system for assaying the effect of gene expression levels on fitness in Saccharomyces cerevisiae by systematically altering the expression level of 100 genes at 100 distinct levels spanning a 500-fold range at high resolution. We show that the relationship between expression levels and growth is gene and environment specific and provides information on the function, stoichiometry, and interactions of genes. Wild-type expression levels in some conditions are not optimal for growth, and genes whose fitness is greatly affected by small changes in expression level tend to exhibit lower cell-to-cell variability in expression. Our study addresses a fundamental gap in understanding the functional significance of gene expression regulation and offers a framework for evaluating the phenotypic effects of expression variation.
A Minimalistic Resource Allocation Model to Explain Ubiquitous Increase in Protein Expression with Growth Rate(2016) PLoS ONE. 11, 4, e0153344. Abstract
Most proteins show changes in level across growth conditions. Many of these changes seem to be coordinated with the specific growth rate rather than the growth environment or the protein function. Although cellular growth rates, gene expression levels and gene regulation have been at the center of biological research for decades, there are only a few models giving a base line prediction of the dependence of the proteome fraction occupied by a gene with the specific growth rate. We present a simple model that predicts a widely coordinated increase in the fraction of many proteins out of the proteome, proportionally with the growth rate. The model reveals how passive redistribution of resources, due to active regulation of only a few proteins, can have proteome wide effects that are quantitatively predictable. Our model provides a potential explanation for why and how such a coordinated response of a large fraction of the proteome to the specific growth rate arises under different environmental conditions. The simplicity of our model can also be useful by serving as a baseline null hypothesis in the search for active regulation. We exemplify the usage of the model by analyzing the relationship between growth rate and proteome composition for the model microorganism E. coli as reflected in recent proteomics data sets spanning various growth conditions. We find that the fraction out of the proteome of a large number of proteins, and from different cellular processes, increases proportionally with the growth rate. Notably, ribosomal proteins, which have been previously reported to increase in fraction with growth rate, are only a small part of this group of proteins. We suggest that, although the fractions of many proteins change with the growth rate, such changes may be partially driven by a global effect, not necessarily requiring specific cellular control mechanisms.
(2015) Genome Research. 25, p. 1893-1902 Abstract
Genetically identical cells exposed to the same environment display variability in gene expression (noise), with important consequences for the fidelity of cellular regulation and biological function. Although population average gene expression is tightly coupled to growth rate, the effects of changes in environmental conditions on expression variability are not known. Here, we measure the single-cell expression distributions of approximately 900 Saccharomyces cerevisiae promoters across four environmental conditions using flow cytometry, and find that gene expression noise is tightly coupled to the environment and is generally higher at lower growth rates. Nutrient-poor conditions, which support lower growth rates, display elevated levels of noise for most promoters, regardless of their specific expression values. We present a simple model of noise in expression that results from having an asynchronous population, with cells at different cell-cycle stages, and with different partitioning of the cells between the stages at different growth rates. This model predicts non-monotonic global changes in noise at different growth rates as well as overall higher variability in expression for cell-cycle-regulated genes in all conditions. The consistency between this model and our data, as well as with noise measurements of cells growing in a chemostat at well-defined growth rates, suggests that cell-cycle heterogeneity is a major contributor to gene expression noise. Finally, we identify gene and promoter features that play a role in gene expression noise across conditions. Our results show the existence of growth-related global changes in gene expression noise and suggest their potential phenotypic implications.
(2014) Genome Research. 24, 10, p. 1698-1706 Abstract
Genetically identical cells exhibit large variability (noise) in gene expression, with important consequences for cellular function. Although the amount of noise decreases with and is thus partly determined by the mean expression level, the extent to which different promoter sequences can deviate away from this trend is not fully known. Here, we present a high-throughput method for measuring promoter-driven noise for thousands of designed synthetic promoters in parallel. We use it to investigate how promoters encode different noise levels and find that the noise levels of promoters with similar mean expression levels can vary more than one order of magnitude, with nucleosome-disfavoring sequences resulting in lower noise and more transcription factor binding sites resulting in higher noise. We propose a kinetic model of gene expression that takes into account the nonspecific DNA binding and one-dimensional sliding along the DNA, which occurs when transcription factors search for their target sites. We show that this assumption can improve the prediction of the mean-independent component of expression noise for our designed promoter sequences, suggesting that a transcription factor target search may affect gene expression noise. Consistent with our findings in designed promoters, we find that binding-site multiplicity in native promoters is associated with higher expression noise. Overall, our results demonstrate that small changes in promoter DNA sequence can tune noise levels in a manner that is predictable and partly decoupled from effects on the mean expression levels. These insights may assist in designing promoters with desired noise levels.
(2013) Molecular Systems Biology. 9, Abstract
Most genes change expression levels across conditions, but it is unclear which of these changes represents specific regulation and what determines their quantitative degree. Here, we accurately measured activities of B900 S. cerevisiae and B1800 E. coli promoters using fluorescent reporters. We show that in both organisms 60-90% of promoters change their expression between conditions by a constant global scaling factor that depends only on the conditions and not on the promoter's identity. Quantifying such global effects allows precise characterization of specific regulationpromoters deviating from the global scale line. These are organized into few functionally related groups that also adhere to scale lines and preserve their relative activities across conditions. Thus, only several scaling factors suffice to accurately describe genome-wide expression profiles across conditions. We present a parameter-free passive resource allocation model that quantitatively accounts for the global scaling factors. It suggests that many changes in expression across conditions result from global effects and not specific regulation, and provides means for quantitative interpretation of expression profiles.
Sequence features of yeast and human core promoters that are predictive of maximal promoter activity(2013) Nucleic Acids Research. 41, 11, p. 5569-5581 Abstract
The core promoter is the region in which RNA polymerase II is recruited to the DNA and acts to initiate transcription, but the extent to which the core promoter sequence determines promoter activity levels is largely unknown. Here, we identified several base content and k-mer sequence features of the yeast core promoter sequence that are highly predictive of maximal promoter activity. These features are mainly located in the region 75 bp upstream and 50 bp downstream of the main transcription start site, and their associations hold for both constitutively active promoters and promoters that are induced or repressed in specific conditions. Our results unravel several architectural features of yeast core promoters and suggest that the yeast core promoter sequence downstream of the TATA box (or of similar sequences involved in recruitment of the pre-initiation complex) is a major determinant of maximal promoter activity. We further show that human core promoters also contain features that are indicative of maximal promoter activity; thus, our results emphasize the important role of the core promoter sequence in transcriptional regulation.
Measurements of the Impact of 3 ' End Sequences on Gene Expression Reveal Wide Range and Sequence Dependent Effects(2013) PLoS Computational Biology. 9, 3, Abstract
A full understanding of gene regulation requires an understanding of the contributions that the various regulatory regions have on gene expression. Although it is well established that sequences downstream of the main promoter can affect expression, our understanding of the scale of this effect and how it is encoded in the DNA is limited. Here, to measure the effect of native S. cerevisiae 3' end sequences on expression, we constructed a library of 85 fluorescent reporter strains that differ only in their 3' end region. Notably, despite being driven by the same strong promoter, our library spans a continuous twelve-fold range of expression values. These measurements correlate with endogenous mRNA levels, suggesting that the 3' end contributes to constitutive differences in mRNA levels. We used deep sequencing to map the 3'UTR ends of our strains and show that determination of polyadenylation sites is intrinsic to the local 3' end sequence. Polyadenylation mapping was followed by sequence analysis, we found that increased A/T content upstream of the main polyadenylation site correlates with higher expression, both in the library and genome-wide, suggesting that native genes differ by the encoded efficiency of 3' end processing. Finally, we use single cells fluorescence measurements, in different promoter activation levels, to show that 3' end sequences modulate protein expression dynamics differently than promoters, by predominantly affecting the size of protein production bursts as opposed to the frequency at which these bursts occur. Altogether, our results lead to a more complete understanding of gene regulation by demonstrating that 3' end regions have a unique and sequence dependent effect on gene expression.
(2013) GENOME BIOLOGY. 14, 11, Abstract
A new study exploits the time-dependence of formaldehyde cross-linking in the commonly used chromatin immunoprecipitation (ChIP) assay to infer the on and off rates for site-specific chromatin interactions.
Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast(2012) Nature Genetics. 44, 7, p. 743-U163 Abstract
Understanding how precise control of gene expression is specified within regulatory DNA sequences is a key challenge with far-reaching implications. Many studies have focused on the regulatory role of transcription factor-binding sites. Here, we explore the transcriptional effects of different elements, nucleosome-disfavoring sequences and, specifically, poly(dA:dT) tracts that are highly prevalent in eukaryotic promoters. By measuring promoter activity for a large-scale promoter library, designed with systematic manipulations to the properties and spatial arrangement of poly(dA:dT) tracts, we show that these tracts significantly and causally affect transcription. We show that manipulating these elements offers a general genetic mechanism, applicable to promoters regulated by different transcription factors, for tuning expression in a predictable manner, with resolution that can be even finer than that attained by altering transcription factor sites. Overall, our results advance the understanding of the regulatory code and suggest a potential mechanism by which promoters yielding prespecified expression patterns can be designed.
Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters(2012) Nature Biotechnology. 30, 6, p. 521-+ Abstract
Despite extensive research, our understanding of the rules according to which cis-regulatory sequences are converted into gene expression is limited. We devised a method for obtaining parallel, highly accurate gene expression measurements from thousands of designed promoters and applied it to measure the effect of systematic changes in the location, number, orientation, affinity and organization of transcription-factor binding sites and nucleosome-disfavoring sequences. Our analyses reveal a clear relationship between expression and binding-site multiplicity, as well as dependencies of expression on the distance between transcription-factor binding sites and gene starts which are transcription-factor specific, including a striking similar to 10-bp periodic relationship between gene expression and binding-site location. We show how this approach can measure transcription-factor sequence specificities and the sensitivity of transcription-factor sites to the surrounding sequence context, and compare the activity of 75 yeast transcription factors. Our method can be used to study both cis and trans effects of genotype on transcriptional, post-transcriptional and translational control.
Compensation for differences in gene copy number among yeast ribosomal proteins is encoded within their promoters(2011) Genome Research. 21, 12, p. 2114-2128 Abstract
Coordinate regulation of ribosomal protein (RP) genes is key for controlling cell growth. In yeast, it is unclear how this regulation achieves the required equimolar amounts of the different RP components, given that some RP genes exist in duplicate copies, while others have only one copy. Here, we tested whether the solution to this challenge is partly encoded within the DNA sequence of the RP promoters, by fusing 110 different RP promoters to a fluorescent gene reporter, allowing us to robustly detect differences in their promoter activities that are as small as similar to 10%. We found that single-copy RP promoters have significantly higher activities, suggesting that proper RP stoichiometry is indeed partly encoded within the RP promoters. Notably, we also partially uncovered how this regulation is encoded by finding that RP promoters with higher activity have more nucleosome-disfavoring sequences and characteristic spatial organizations of these sequences and of binding sites for key RP regulators. Mutations in these elements result in a significant decrease of RP promoter activity. Thus, our results suggest that intrinsic (DNA-dependent) nucleosome organization may be a key mechanism by which genomes encode biologically meaningful promoter activities. Our approach can readily be applied to uncover how transcriptional programs of other promoters are encoded.
(2010) Trends in Genetics. 26, 8, p. 335-340 Abstract
The recently discovered prokaryotic immune system known as CRISPR (clustered regularly interspaced short palindromic repeats) is based on small RNAs ('spacers') that restrict phage and plasmid infection. It has been hypothesized that CRISPRs can also regulate self gene expression by utilizing spacers that target self genes. By analyzing CRISPRs from 330 organisms we found that one in every 250 spacers is self-targeting, and that such self-targeting occurs in 18% of all CRISPR-bearing organisms. However, complete lack of conservation across species, combined with abundance of degraded repeats near self-targeting spacers, suggests that self-targeting is a form of autoimmunity rather than a regulatory mechanism. We propose that accidental incorporation of self nucleic acids by CRISPR can incur an autoimmune fitness cost, and this could explain the abundance of degraded CRISPR systems across prokaryotes.