Publications
2024
-
(2024) Nature. 634, 8034, p. 652-661 Abstract
The developing placenta, which in mice originates through the extraembryonic ectoderm (ExE), is essential for mammalian embryonic development. Yet unbiased characterization of the differentiation dynamics of the ExE and its interactions with the embryo proper remains incomplete. Here we develop a temporal single-cell model of mouse gastrulation that maps continuous and parallel differentiation in embryonic and extraembryonic lineages. This is matched with a three-way perturbation approach to target signalling from the embryo proper, the ExE alone, or both. We show that ExE specification involves early spatial and transcriptional bifurcation of uncommitted ectoplacental cone cells and chorion progenitors. Early BMP4 signalling from chorion progenitors is required for proper differentiation of uncommitted ectoplacental cone cells and later for their specification towards trophoblast giant cells. We also find biphasic regulation by BMP4 in the embryo. The early ExE-originating BMP4 signal is necessary for proper mesoendoderm bifurcation and for allantois and primordial germ cell specification. However, commencing at embryonic day 7.5, embryo-derived BMP4 restricts the primordial germ cell pool size by favouring differentiation of their extraembryonic mesoderm precursors towards an allantois fate. ExE and embryonic tissues are therefore entangled in time, space and signalling axes, highlighting the importance of their integrated understanding and modelling in vivo and in vitro.
-
(2024) Nature Cancer. 5, 5, p. 742-759 Abstract
Successful immunotherapy relies on triggering complex responses involving T cell dynamics in tumors and the periphery. Characterizing these responses remains challenging using static human single-cell atlases or mouse models. To address this, we developed a framework for in vivo tracking of tumor-specific CD8+ T cells over time and at single-cell resolution. Our tools facilitate the modeling of gene program dynamics in the tumor microenvironment (TME) and the tumor-draining lymph node (tdLN). Using this approach, we characterize two modes of anti-programmed cell death protein 1 (PD-1) activity, decoupling induced differentiation of tumor-specific activated precursor cells from conventional type 1 dendritic cell (cDC1)-dependent proliferation and recruitment to the TME. We demonstrate that combining anti-PD-1 therapy with anti-4-1BB agonist enhances the recruitment and proliferation of activated precursors, resulting in tumor control. These data suggest that effective response to anti-PD-1 therapy is dependent on sufficient influx of activated precursor CD8+ cells to the TME and highlight the importance of understanding system-level dynamics in optimizing immunotherapies.
-
(2024) Nature Aging. 4, 1, p. 129-144 Abstract
To understand human longevity, inherent aging processes must be distinguished from known etiologies leading to age-related chronic diseases. Such deconvolution is difficult to achieve because it requires tracking patients throughout their entire lives. Here, we used machine learning to infer health trajectories over the entire adulthood age range using extrapolation from electronic medical records with partial longitudinal coverage. Using this approach, our model tracked the state of patients who were healthy and free from known chronic disease risk and distinguished individuals with higher or lower longevity potential using a multivariate score. We showed that the model and the markers it uses performed consistently on data from Israeli, British and US populations. For example, mildly low neutrophil counts and alkaline phosphatase levels serve as early indicators of healthy aging that are independent of risk for major chronic diseases. We characterize the heritability and genetic associations of our longevity score and demonstrate at least 1 year of extended lifespan for parents of high-scoring patients compared to matched controls. Longitudinal modeling of healthy individuals is thereby established as a tool for understanding healthy aging and longevity.
-
(2024) Cell. 187, 2, p. 375-389.e18 Abstract
Immune checkpoint inhibition treatment using aPD-1 monoclonal antibodies is a promising cancer immunotherapy approach. However, its effect on tumor immunity is narrow, as most patients do not respond to the treatment or suffer from recurrence. We show that the crosstalk between conventional type I dendritic cells (cDC1) and T cells is essential for an effective aPD-1-mediated anti-tumor response. Accordingly, we developed a bispecific DC-T cell engager (BiCE), a reagent that facilitates physical interactions between PD-1+ T cells and cDC1. BiCE treatment promotes the formation of active dendritic/T cell crosstalk in the tumor and tumor-draining lymph nodes. In vivo, single-cell and physical interacting cell analysis demonstrates the distinct and superior immune reprogramming of the tumors and tumor-draining lymph nodes treated with BiCE as compared to conventional aPD-1 treatment. By bridging immune cells, BiCE potentiates cell circuits and communication pathways needed for effective anti-tumor immunity.
2023
-
(2023) GENOME BIOLOGY. 24, 1, 220. Abstract
We describe MCProjan algorithm for analyzing query scRNA-seq data by projections over reference single-cell atlases. We represent the reference as a manifold of annotated metacell gene expression distributions. We then interpret query metacells as mixtures of atlas distributions while correcting for technology-specific gene biases. This approach distinguishes and tags query cells that are consistent with atlas states from unobserved (novel or artifactual) behaviors. It also identifies expression differences observed in successfully mapped query states. We showcase MCProj functionality by projecting scRNA-seq data on a blood cell atlas, deriving precise, quantitative, and interpretable results across technologies and datasets.
-
(2023) Cell Stem Cell. 30, 9, p. 1262-1281.e8 Abstract
RNA splicing factors are recurrently mutated in clonal blood disorders, but the impact of dysregulated splicing in hematopoiesis remains unclear. To overcome technical limitations, we integrated genotyping of transcriptomes (GoT) with long-read single-cell transcriptomics and proteogenomics for single-cell profiling of transcriptomes, surface proteins, somatic mutations, and RNA splicing (GoT-Splice). We applied GoT-Splice to hematopoietic progenitors from myelodysplastic syndrome (MDS) patients with mutations in the core splicing factor SF3B1. SF3B1mut cells were enriched in the megakaryocytic-erythroid lineage, with expansion of SF3B1mut erythroid progenitor cells. We uncovered distinct cryptic 3 splice site usage in different progenitor populations and stage-specific aberrant splicing during erythroid differentiation. Profiling SF3B1-mutated clonal hematopoiesis samples revealed that erythroid bias and cell-type-specific cryptic 3 splice site usage in SF3B1mut cells precede overt MDS. Collectively, GoT-Splice defines the cell-type-specific impact of somatic mutations on RNA splicing, from early clonal outgrowths to overt neoplasia, directly in human samples.
-
(2023) Cancer Discovery. 13, 7, p. 1616-1635 Abstract
Multiple studies have identified metabolic changes within the tumor and its microenvironment during carcinogenesis. Yet, the mechanisms by which tumors affect the host metabolism are unclear. We find that systemic inflammation induced by cancer leads to liver infiltration of myeloid cells at early extrahepatic carcinogenesis. The infiltrating immune cells via IL6-pSTAT3 immune-hepatocyte cross-talk cause the depletion of a master metabolic regulator, HNF4α, consequently leading to systemic metabolic changes that promote breast and pancreatic cancer proliferation and a worse outcome. Preserving HNF4α levels maintains liver metabolism and restricts carcinogenesis. Standard liver biochemical tests can identify early metabolic changes and predict patients' outcomes and weight loss. Thus, the tumor induces early metabolic changes in its macroenvironment with diagnostic and potentially therapeutic implications for the host. SIGNIFICANCE: Cancer growth requires a permanent nutrient supply starting from early disease stages. We find that the tumor extends its effect to the host's liver to obtain nutrients and rewires the systemic and tissue-specific metabolism early during carcinogenesis. Preserving liver metabolism restricts tumor growth and improves cancer outcomes. This article is highlighted in the In This Issue feature, p. 1501.
-
(2023) Cell. 186, 12, p. 2610-2627.e18 Abstract
The hourglass model describes the convergence of species within the same phylum to a similar body plan during development; however, the molecular mechanisms underlying this phenomenon in mammals remain poorly described. Here, we compare rabbit and mouse time-resolved differentiation trajectories to revisit this model at single-cell resolution. We modeled gastrulation dynamics using hundreds of embryos sampled between gestation days 6.0 and 8.5 and compared the species using a framework for time-resolved single-cell differentiation-flows analysis. We find convergence toward similar cell-state compositions at E7.5, supported by the quantitatively conserved expression of 76 transcription factors, despite divergence in surrounding trophoblast and hypoblast signaling. However, we observed noticeable changes in specification timing of some lineages and divergence of primordial germ cell programs, which in the rabbit do not activate mesoderm genes. Comparative analysis of temporal differentiation models provides a basis for studying the evolution of gastrulation dynamics across mammals.
-
(2023) Nature Communications. 14, 1, 3844. Abstract
Embryonic development involves massive proliferation and differentiation of cell lineages. This must be supported by chromosome replication and epigenetic reprogramming, but how proliferation and cell fate acquisition are balanced in this process is not well understood. Here we use single cell Hi-C to map chromosomal conformations in post-gastrulation mouse embryo cells and study their distributions and correlations with matching embryonic transcriptional atlases. We find that embryonic chromosomes show a remarkably strong cell cycle signature. Despite that, replication timing, chromosome compartment structure, topological associated domains (TADs) and promoter-enhancer contacts are shown to be variable between distinct epigenetic states. About 10% of the nuclei are identified as primitive erythrocytes, showing exceptionally compact and organized compartment structure. The remaining cells are broadly associated with ectoderm and mesoderm identities, showing only mild differentiation of TADs and compartment structures, but more specific localized contacts in hundreds of ectoderm and mesoderm promoter-enhancer pairs. The data suggest that while fully committed embryonic lineages can rapidly acquire specific chromosomal conformations, most embryonic cells are showing plastic signatures driven by complex and intermixed enhancer landscapes.
2022
-
(2022) Genome Biology. 23, 1, 100. Abstract
Scaling scRNA-seq to profile millions of cells is crucial for constructing high-resolution maps of transcriptional manifolds. Current analysis strategies, in particular dimensionality reduction and two-phase clustering, offer only limited scaling and sensitivity to define such manifolds. We introduce Metacell-2, a recursive divide-and-conquer algorithm allowing efficient decomposition of scRNA-seq datasets of any size into small and cohesive groups of cells called metacells. Metacell-2 improves outlier cell detection and rare cell type identification, as shown with human bone marrow cell atlas and mouse embryonic data. Metacell-2 is implemented over the scanpy framework for easy integration in any analysis pipeline.
-
(2022) Nature Structural and Molecular Biology. 29, 12, p. 1252-1265 Abstract
In mammalian embryos, DNA methylation is initialized to maximum levels in the epiblast by the de novo DNA methyltransferases DNMT3A and DNMT3B before gastrulation diversifies it across regulatory regions. Here we show that DNMT3A and DNMT3B are differentially regulated during endoderm and mesoderm bifurcation and study the implications in vivo and in meso-endoderm embryoid bodies. Loss of both Dnmt3a and Dnmt3b impairs exit from the epiblast state. More subtly, independent loss of Dnmt3a or Dnmt3b leads to small biases in mesodermendoderm bifurcation and transcriptional deregulation. Epigenetically, DNMT3A and DNMT3B drive distinct methylation kinetics in the epiblast, as can be predicted from their strand-specific sequence preferences. The enzymes compensate for each other in the epiblast, but can later facilitate lineage-specific methylation kinetics as their expression diverges. Single-cell analysis shows that differential activity of DNMT3A and DNMT3B combines with replication-linked methylation turnover to increase epigenetic plasticity in gastrulation. Together, these findings outline a dynamic model for the use of DNMT3A and DNMT3B sequence specificity during gastrulation.
-
(2022) Science advances. 8, 50, eadd0695. Abstract
The coordinated differentiation of progenitor cells into specialized cell types and their spatial organization into distinct domains is central to embryogenesis. Here, we developed and applied an unbiased spatially resolved single-cell transcriptomics method to identify the genetic programs underlying the emergence of specialized cell types during mouse limb development and their spatial integration. We identify multiple transcription factors whose expression patterns are predominantly associated with cell type specification or spatial position, suggesting two parallel yet highly interconnected regulatory systems. We demonstrate that the embryonic limb undergoes a complex multiscale reorganization upon perturbation of one of its spatial organizing centers, including the loss of specific cell populations, alterations of preexisting cell states molecular identities, and changes in their relative spatial distribution. Our study shows how multidimensional single-cell, spatially resolved molecular atlases can allow the deconvolution of spatial identity and cell fate and reveal the interconnected genetic networks that regulate organogenesis and its reorganization upon genetic alterations.
Spatially resolved scRNA-seq deconvolutes positional and cell type-specific components of cell identity in the mouse limb bud. -
(2022) Cell. 185, 17, p. 3169-3185.e20 Abstract
Mice deficient for all ten-eleven translocation (TET) genes exhibit early gastrulation lethality. However, separating cause and effect in such embryonic failure is challenging. To isolate cell-autonomous effects of TET loss, we used temporal single-cell atlases from embryos with partial or complete mutant contributions. Strikingly, when developing within a wild-type embryo, Tet-mutant cells retain near-complete differentiation potential, whereas embryos solely comprising mutant cells are defective in epiblast to ectoderm transition with degenerated mesoderm potential. We map de-repressions of early epiblast factors (e.g., Dppa4 and Gdf3) and failure to activate multiple signaling from nascent mesoderm (Lefty, FGF, and Notch) as likely cell-intrinsic drivers of TET loss phenotypes. We further suggest loss of enhancer demethylation as the underlying mechanism. Collectively, our work demonstrates an unbiased approach for defining intrinsic and extrinsic embryonic gene function based on temporal differentiation atlases and disentangles the intracellular effects of the demethylation machinery from its broader tissue-level ramifications.
-
(2022) Nature Cancer. 3, p. 303-317 Abstract
Despite their key regulatory role and therapeutic potency, the molecular signatures of interactions between T cells and antigen-presenting myeloid cells within the tumor microenvironment remain poorly characterized. Here, we systematically characterize these interactions using RNA sequencing of physically interacting cells (PIC-seq) and find that CD4+PD-1+CXCL13+ T cells are a major interacting hub with antigen-presenting cells in the tumor microenvironment of human non-small cell lung carcinoma. We define this clonally expanded, tumor-specific and conserved T-cell subset as T-helper tumor (Tht) cells. Reconstitution of Tht cells in vitro and in an ovalbumin-specific αβ TCR CD4+ T-cell mouse model, shows that the Tht program is primed in tumor-draining lymph nodes by dendritic cells presenting tumor antigens, and that their function is important for harnessing the antitumor response of anti-PD-1 treatment. Our molecular and functional findings support the modulation of Tht-dendritic cell interaction checkpoints as a major interventional strategy in immunotherapy.
2021
-
(2021) Nature Communications. 12, 1, 2455. Abstract
The mutational mechanisms underlying recurrent deletions in clonal hematopoiesis are not entirely clear. In the current study we inspect the genomic regions around recurrent deletions in myeloid malignancies, and identify microhomology-based signatures in CALR, ASXL1 and SRSF2 loci. We demonstrate that these deletions are the result of double stand break repair by a PARP1 dependent microhomology-mediated end joining (MMEJ) pathway. Importantly, we provide evidence that these recurrent deletions originate in pre-leukemic stem cells. While DNA polymerase theta (POLQ) is considered a key component in MMEJ repair, we provide evidence that pre-leukemic MMEJ (preL-MMEJ) deletions can be generated in POLQ knockout cells. In contrast, aphidicolin (an inhibitor of replicative polymerases and replication) treatment resulted in a significant reduction in preL-MMEJ. Altogether, our data indicate an association between POLQ independent MMEJ and clonal hematopoiesis and elucidate mutational mechanisms involved in the very first steps of leukemia evolution.
-
(2021) Nature Communications. 12, 1, 5406. Abstract
DNA methylation is aberrant in cancer, but the dynamics, regulatory role and clinical implications of such epigenetic changes are still poorly understood. Here, reduced representation bisulfite sequencing (RRBS) profiles of 1538 breast tumors and 244 normal breast tissues from the METABRIC cohort are reported, facilitating detailed analysis of DNA methylation within a rich context of genomic, transcriptional, and clinical data. Tumor methylation from immune and stromal signatures are deconvoluted leading to the discovery of a tumor replication-linked clock with genome-wide methylation loss in non-CpG island sites. Unexpectedly, methylation in most tumor CpG islands follows two replication-independent processes of gain (MG) or loss (ML) that we term epigenomic instability. Epigenomic instability is correlated with tumor grade and stage, TP53 mutations and poorer prognosis. After controlling for these global trans-acting trends, as well as for X-linked dosage compensation effects, cis-specific methylation and expression correlations are uncovered at hundreds of promoters and over a thousand distal elements. Some of these targeted known tumor suppressors and oncogenes. In conclusion, this study demonstrates that global epigenetic instability can erode cancer methylomes and expose them to localized methylation aberrations in-cis resulting in transcriptional changes seen in tumors.
-
(2021) Nature Chemical Biology. 17, 11, p. 1139-1147 Abstract
The functional activity and differentiation potential of cells are determined by their interactions with surrounding cells. Approaches that allow unbiased characterization of cell states while at the same time providing spatial information are of major value to assess this environmental influence. However, most current techniques are hampered by a tradeoff between spatial resolution and cell profiling depth. Here, we develop a photocage-based technology that allows isolation and in-depth analysis of live cells from regions of interest in complex ex vivo systems, including primary human tissues. The use of a highly sensitive 4-nitrophenyl(benzofuran) cage coupled to a set of nanobodies allows high-resolution photo-uncaging of different cell types in areas of interest. Single-cell RNA-sequencing of spatially defined CD8+ T cells is used to exemplify the feasibility of identifying location-dependent cell states. The technology described here provides a valuable tool for the analysis of spatially defined cells in diverse biological systems, including clinical samples. [Figure not available: see fulltext.]
-
(2021) Trends in Genetics. 37, 10, p. 919-932 Abstract
A fundamental characteristic of animal multicellularity is the spatial coexistence of functionally specialized cell types that are all encoded by a single genome sequence. Cell type transcriptional programs are deployed and maintained by regulatory mechanisms that control the asymmetric, differential access to genomic information in each cell. This genome regulation ultimately results in specific cellular phenotypes. However, the emergence, diversity, and evolutionary dynamics of animal cell types remain almost completely unexplored beyond a few species. Single-cell genomics is emerging as a powerful tool to build comprehensive catalogs of cell types and their associated gene regulatory programs in non-traditional model species. We review the current state of sampling efforts across the animal tree of life and challenges ahead for the comparative study of cell type programs. We also discuss how the phylogenetic integration of cell atlases can lead to the development of models of cell type evolution and a phylogenetic taxonomy of cells.
-
(2021) Nature Cancer. 2, 10, p. 1055-1070 Abstract
Stochastic transition of cancer cells between drug-sensitive and drug-tolerant persister phenotypes has been proposed to play a key role in non-genetic resistance to therapy. Yet, we show here that cancer cells actually possess a highly stable inherited chance to persist (CTP) during therapy. This CTP is non-stochastic, determined pre-treatment and has a unimodal distribution ranging from 0 to almost 100%. Notably, CTP is drug specific. We found that differential serine/threonine phosphorylation of the insulin receptor substrate 1 (IRS1) protein determines the CTP of lung and of head and neck cancer cells under epidermal growth factor receptor inhibition, both in vitro and in vivo. Indeed, the first-in-class IRS1 inhibitor NT219 was highly synergistic with anti-epidermal growth factor receptor therapy across multiple in vitro and in vivo models. Elucidation of drug-specific mechanisms that determine the degree and stability of cellular CTP may establish a framework for the elimination of cancer persisters, using new rationally designed drug combinations.
-
(2021) Nature Medicine. 27, 9, p. 1582-1591 Abstract
Standardized lab tests are central for patient evaluation, differential diagnosis and treatment. Interpretation of these data is nevertheless lacking quantitative and personalized metrics. Here we report on the modeling of 2.1 billion lab measurements of 92 different lab tests from 2.8 million adults over a span of 18 years. Following unsupervised filtering of 131 chronic conditions and 5,223 drugtest pairs we performed a virtual survey of lab tests distributions in healthy individuals. Age and sex alone explain less than 10% of the within-normal test variance in 89 out of 92 tests. Personalized models based on patients history explain 60% of the variance for 17 tests and over 36% for half of the tests. This allows for systematic stratification of the risk for future abnormal test levels and subsequent emerging disease. Multivariate modeling of within-normal lab tests can be readily implemented as a basis for quantitative patient evaluation.
-
(2021) Nature Plants. 7, 6, p. 800-813 Abstract
The vegetative-to-floral transition is a dramatic developmental change of the shoot apical meristem, promoted by the systemic florigen signal. However, poor molecular temporal resolution of this dynamic process has precluded characterization of how meristems respond to florigen induction. Here, we develop a technology that allows sensitive transcriptional profiling of individual shoot apical meristems. Computational ordering of hundreds of tomato samples reconstructed the floral transition process at fine temporal resolution and uncovered novel short-lived gene expression programs that are activated before flowering. These programs are annulled only when both florigen and a parallel signalling pathway are eliminated. Functional screening identified genes acting at the onset of pre-flowering programs that are involved in the regulation of meristem morphogenetic changes but dispensable for the timing of floral transition. Induced expression of these short-lived transition-state genes allowed us to determine their genetic hierarchies and to bypass the need for the main flowering pathways. Our findings illuminate how systemic and autonomous pathways are integrated to control a critical developmental switch.
-
(2021) Cell. 184, 11, p. 2825-2842.e22 Abstract
Mouse embryonic development is a canonical model system for studying mammalian cell fate acquisition. Recently, single-cell atlases comprehensively charted embryonic transcriptional landscapes, yet inference of the coordinated dynamics of cells over such atlases remains challenging. Here, we introduce a temporal model for mouse gastrulation, consisting of data from 153 individually sampled embryos spanning 36 h of molecular diversification. Using algorithms and precise timing, we infer differentiation flows and lineage specification dynamics over the embryonic transcriptional manifold. Rapid transcriptional bifurcations characterize the commitment of early specialized node and blood cells. However, for most lineages, we observe combinatorial multi-furcation dynamics rather than hierarchical transcriptional transitions. In the mesoderm, dozens of transcription factors combinatorially regulate multifurcations, as we exemplify using time-matched chimeric embryos of Foxc1/Foxc2 mutants. Our study rejects the notion of differentiation being governed by a series of binary choices, providing an alternative quantitative model for cell fate acquisition.
-
Single-cell transcriptomic analyses provide insights into the developmental origins of neuroblastoma(2021) Nature Genetics. 53, p. 683-693 Abstract
Neuroblastoma is a pediatric tumor of the developing sympathetic nervous system. However, the cellular origin of neuroblastoma has yet to be defined. Here we studied the single-cell transcriptomes of neuroblastomas and normal human developing adrenal glands at various stages of embryonic and fetal development. We defined normal differentiation trajectories from Schwann cell precursors over intermediate states to neuroblasts or chromaffin cells and showed that neuroblastomas transcriptionally resemble normal fetal adrenal neuroblasts. Importantly, neuroblastomas with varying clinical phenotypes matched different temporal states along normal neuroblast differentiation trajectories, with the degree of differentiation corresponding to clinical prognosis. Our work highlights the roles of oncogenic MYCN and loss of TFAP2B in blocking differentiation and may provide the basis for designing therapeutic interventions to overcome differentiation blocks.
-
(2021) Cell. 184, 11, p. 2973-2987.e18 Abstract
Stony corals are colonial cnidarians that sustain the most biodiverse marine ecosystems on Earth: coral reefs. Despite their ecological importance, little is known about the cell types and molecular pathways that underpin the biology of reef-building corals. Using single-cell RNA sequencing, we define over 40 cell types across the life cycle of Stylophora pistillata. We discover specialized immune cells, and we uncover the developmental gene expression dynamics of calcium-carbonate skeleton formation. By simultaneously measuring the transcriptomes of coral cells and the algae within them, we characterize the metabolic programs involved in symbiosis in both partners. We also trace the evolution of these coral cell specializations by phylogenetic integration of multiple cnidarian cell type atlases. Overall, this study reveals the molecular and cellular basis of stony coral biology.
-
(2021) Proceedings of the National Academy of Sciences. 118, 7, 2003926118. Abstract
Hormones control the major biological functions of stress response, growth, metabolism, and reproduction. In animals, these hormones show pronounced seasonality, with different set-points for different seasons. In humans, the seasonality of these hormones remains unclear, due to a lack of datasets large enough to discern common patterns and cover all hormones. Here, we analyze an Israeli health record on 46 million person-years, including millions of hormone blood tests. We find clear seasonal patterns: The effector hormones peak in winter-spring, whereas most of their upstream regulating pituitary hormones peak only months later, in summer. This delay of months is unexpected because known delays in the hormone circuits last hours. We explain the precise delays and amplitudes by proposing and testing a mechanism for the circannual clock: The gland masses grow with a timescale of months due to trophic effects of the hormones, generating a feedback circuit with a natural frequency of about a year that can entrain to the seasons. Thus, humans may show coordinated seasonal set-points with a winter-spring peak in the growth, stress, metabolism, and reproduction axes.
2020
-
(2020) Haematologica. 105, 12, p. 2861-2863 Abstract
Acute myeloid leukemia (AML) is one of the extreme outcomes of age-related clonal hematopoiesis (ARCH)1. With aging, mutations accumulate in hematopoietic stem and progenitor cells (HSPCs)2,3. Based on the estimated number of HSPCs (~50,000) in the human body and the number of somatic mutations in adult single cells (~1000)4, it is predicted that every ~100 nucleotides, a somatic mutation will occur at a low variant allele frequency (VAF).
-
(2020) Nature. 587, 7834, p. 377-386 Abstract
LifeTime aims to track, understand and target human cells during the onset and progression of complex diseases and their response to therapy at single-cell resolution. This mission will be implemented through the development and integration of single-cell multi-omics and imaging, artificial intelligence and patient-derived experimental disease models during progression from health to disease. Analysis of such large molecular and clinical datasets will discover molecular mechanisms, create predictive computational models of disease progression, and reveal new drug targets and therapies. Timely detection and interception of disease embedded in an ethical and patient-centered vision will be achieved through interactions across academia, hospitals, patient-associations, health data management systems and industry. Applying this strategy to key medical challenges in cancer, neurological, infectious, chronic inflammatory and cardiovascular diseases at the single-cell level will usher in cell-based interceptive medicine in Europe over the next decade.
-
(2020) Nature Genetics. 52, 7, p. 709-718 Abstract
Propagation of clonal regulatory programs contributes to cancer development. It is poorly understood how epigenetic mechanisms interact with genetic drivers to shape this process. Here, we combine single-cell analysis of transcription and DNA methylation with a Luria-Delbruck experimental design to demonstrate the existence of clonally stable epigenetic memory in multiple types of cancer cells. Longitudinal transcriptional and genetic analysis of clonal colon cancer cell populations reveals a slowly drifting spectrum of epithelial-to-mesenchymal transcriptional identities that is seemingly independent of genetic variation. DNA methylation landscapes correlate with these identities but also reflect an independent clock-like methylation loss process. Methylation variation can be explained as an effect of globaltrans-acting factors in most cases. However, for a specific class of promoters-in particular, cancer-testis antigens-de-repression is correlated with and probably driven by loss of methylation incis. This study indicates how genetic sub-clonal structure in cancer cells can be diversified by epigenetic memory.Longitudinal single-cell analysis of transcription and DNA methylation dynamics in cancer cell lines suggests a clonally stable epigenetic memory. Colon cancer cells show a spectrum of epithelial-to-mesenchymal identities that seems independent of genetic variation.
-
(2020) Science Advances. 6, 21, eaba4137. Abstract
The discovery of giant viruses infecting eukaryotes from diverse ecosystems has revolutionized our understanding of the evolution of viruses and their impact on protist biology, yet knowledge on their replication strategies and transcriptome regulation remains limited. Here, we profile single-cell transcriptomes of the globally distributed microalga Emiliania huxleyi and its specific giant virus during infection. We detected profound heterogeneity in viral transcript levels among individual cells. Clustering single cells based on viral expression profiles enabled reconstruction of the viral transcriptional trajectory. Reordering cells along this path unfolded highly resolved viral genetic programs composed of genes with distinct promoter elements that orchestrate sequential expression. Exploring host transcriptome dynamics across the viral infection states revealed rapid and selective shutdown of protein-encoding nuclear transcripts, while the plastid and mitochondrial transcriptomes persisted into later stages. Single-cell RNA-seq opens a new avenue to unravel the life cycle of giant viruses and their unique hijacking strategies.
-
(2020) Nature Biotechnology. 38, 5, p. 629-637 Abstract
PIC-seq characterizes cellular crosstalk by sorting and sequencing physically interacting cells.Crosstalk between neighboring cells underlies many biological processes, including cell signaling, proliferation and differentiation. Current single-cell genomic technologies profile each cell separately after tissue dissociation, losing information on cell-cell interactions. In the present study, we present an approach for sequencing physically interacting cells (PIC-seq), which combines cell sorting of physically interacting cells (PICs) with single-cell RNA-sequencing. Using computational modeling, PIC-seq systematically maps in situ cellular interactions and characterizes their molecular crosstalk. We apply PIC-seq to interrogate diverse interactions including immune-epithelial PICs in neonatal murine lungs. Focusing on interactions between T cells and dendritic cells (DCs) in vitro and in vivo, we map T cell-DC interaction preferences, and discover regulatory T cells as a major T cell subtype interacting with DCs in mouse draining lymph nodes. Analysis of T cell-DC pairs reveals an interaction-specific program between pathogen-presenting migratory DCs and T cells. PIC-seq provides a direct and broadly applicable technology to characterize intercellular interaction-specific pathways at high resolution.
2019
-
(2019) GENOME BIOLOGY. 20, 1, 206. Abstract
scRNA-seq profiles each represent a highly partial sample of mRNA molecules from a unique cell that can never be resampled, and robust analysis must separate the sampling effect from biological variance. We describe a methodology for partitioning scRNA-seq datasets into metacells: disjoint and homogenous groups of profiles that could have been resampled from the same cell. Unlike clustering analysis, our algorithm specializes at obtaining granular as opposed to maximal groups. We show how to use metacells as building blocks for complex quantitative transcriptional maps while avoiding data smoothing. Our algorithms are implemented in the MetaCell R/C++ software package.
-
(2019) Journal of Molecular Biology. 431, 13, p. 2398-2406 Abstract
Genome-wide analysis of cellular transcriptomes using RNA-seq or expression arrays is a major mainstay of current biological and biomedical research. EXPANDER (EXPression ANalyzer and DisplayER) is a comprehensive software package for analysis of expression data, with built-in support for 18 different organisms. It is designed as a "one-stop shop" platform for transcriptomic analysis, allowing for execution of all analysis steps starting with gene expression data matrix. Analyses offered include low-level preprocessing and normalization, differential expression analysis, clustering, bi-clustering, supervised grouping, high-level functional and pathway enrichment tests, and networks and motif analyses. A variety of options is offered for each step, using established algorithms, including many developed and published by our laboratory.EXPANDER has been continuously developed since 2003, having to date over 18,000 downloads and 540 citations. One of the innovations in the recent version is support for combined analysis of gene expression and ChIP-seq data to enhance the inference of transcriptional networks and their functional interpretation. EXPANDER implements cutting-edge algorithms and makes them accessible to users through user-friendly interface and intuitive visualizations. It is freely available to users at http://acgt.cs.tau.ac.il/expander/. (C) 2019 The Authors. Published by Elsevier Ltd.
-
(2019) Nature Protocols. 14, 6, p. 1841-1862 Abstract
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 49 orders of magnitude. Relying solely on unbiased sampling to characterize cellular niches becomes infeasible, as the marginal utility of collecting more cells diminishes quickly. Furthermore, in many clinical samples, the relevant cell types are scarce and efficient processing is critical. We developed an integrated pipeline for index sorting and massively parallel single-cell RNA sequencing (MARS-seq2.0) that builds on our previously published MARS-seq approach. MARS-seq2.0 is based on >1 million cells sequenced with this pipeline and allows identification of unique cell types across different tissues and diseases, as well as unique model systems and organisms. Here, we present a detailed step-by-step procedure for applying the method. In the improved procedure, we combine sub-microliter reaction volumes, optimization of enzymatic mixtures and an enhanced analytical pipeline to substantially lower the cost, improve reproducibility and reduce well-to-well contamination. Data analysis combines multiple layers of quality assessment and error detection and correction, graphically presenting key statistics for library complexity, noise distribution and sequencing saturation. Importantly, our combined FACS and single-cell RNA sequencing (scRNA-seq) workflow enables intuitive approaches for depletion or enrichment of cell populations in a data-driven manner that is essential to efficient sampling of complex tissues. The experimental protocol, from cell sorting to a ready-to-sequence library, takes 23 d. Sequencing and processing the data through the analytical pipeline take another 12 d.
-
(2019) International Journal of Cancer. 144, 5, p. 1061-1072 Abstract
Lung adenocarcinoma (ADC) is the most prevalent subtype of lung cancer and characterized by considerable morphological and mutational heterogeneity. However, little is known about the epigenomic intratumor variability between spatially separated histological growth patterns of ADC. In order to reconstruct the clonal evolution of histomorphological patterns, we performed global DNA methylation profiling of 27 primary tumor regions, seven matched normal tissues and six lymph node metastases from seven ADC cases. Additionally, we investigated the methylation data from 369 samples of the TCGA ADC cohort. All regions showed varying degrees of methylation changes between segments of different, but also of the same growth patterns. Similarly, copy number variations were seen between spatially distinct segments of each patient. Hierarchical clustering of promoter methylation revealed extensive heterogeneity within and between the cases. Intratumor DNA methylation heterogeneity demonstrated a branched clonal evolution of ADC regions driven by genomic instability with subclonal copy number changes. Notably, methylation profiles within tumors were not more similar to each other than to those from other individuals. In two cases, different tumor regions of the same individuals were represented in distant clusters of the TCGA cohort, illustrating the extensive epigenomic intratumor heterogeneity of ADCs. We found no evidence for the lymph node metastases to be derived from a common growth pattern. Instead, they had evolved early and separately from a particular pattern in each primary tumor. Our results suggest that extensive variation of epigenomic features contributes to the molecular and phenotypic heterogeneity of primary ADCs and lymph node metastases.
-
(2019) Cell Stem Cell. 24, 2, p. 328-341.e9 Abstract
The epigenetic dynamics of induced pluripotent stem cell (iPSC) reprogramming in correctly reprogrammed cells at high resolution and throughout the entire process remain largely undefined. Here, we characterize conversion of mouse fibroblasts into iPSCs using Gatad2a-Mbd3/NuRD-depleted and highly efficient reprogramming systems. Unbiased high-resolution profiling of dynamic changes in levels of gene expression, chromatin engagement, DNA accessibility, and DNA methylation were obtained. We identified two distinct and synergistic transcriptional modules that dominate successful reprogramming, which are associated with cell identity and biosynthetic genes. The pluripotency module is governed by dynamic alterations in epigenetic modifications to promoters and binding by Oct4, Sox2, and Klf4, but not Myc. Early DNA demethylation at certain enhancers prospectively marks cells fated to reprogram. Myc activity drives expression of the essential biosynthetic module and is associated with optimized changes in tRNA codon usage. Our functional validations highlight interweaved epigenetic- and Myc-governed essential reconfigurations that rapidly commission and propel deterministic reprogramming toward naive pluripotency.
-
(2019) Cell. 176, 4, p. 775-789.e18 Abstract
Tumor immune cell compositions play a major role in response to immunotherapy, but the heterogeneity and dynamics of immune infiltrates in human cancer lesions remain poorly characterized. Here, we identify conserved intratumoral CD4 and CD8 T cell behaviors in scRNA-seq data from 25 melanoma patients. We discover a large population of CD8 T cells showing continuous progression from an early effector "transitional" into a dysfunctional T cell state. CD8 T cells that express a complete cytotoxic gene set are rare, and TCR sharing data suggest their independence from the transitional and dysfunctional cell states. Notably, we demonstrate that dysfunctional T cells are the major intratumoral proliferating immune cell compartment and that the intensity of the dysfunctional signature is associated with tumor reactivity. Our data demonstrate that CD8 T cells previously defined as exhausted are in fact a highly proliferating, clonal, and dynamically differentiating cell population within the human tumor microenvironment.
2018
-
(2018) Nature Medicine. 24, 12, p. 1867-1876 Abstract
Multiple myeloma, a plasma cell malignancy, is the second most common blood cancer. Despite extensive research, disease heterogeneity is poorly characterized, hampering efforts for early diagnosis and improved treatments. Here, we apply single cell RNA sequencing to study the heterogeneity of 40 individuals along the multiple myeloma progression spectrum, including 11 healthy controls, demonstrating high interindividual variability that can be explained by expression of known multiple myeloma drivers and additional putative factors. We identify extensive subclonal structures for 10 of 29 individuals with multiple myeloma. In asymptomatic individuals with early disease and in those with minimal residual disease post-treatment, we detect rare tumor plasma cells with molecular characteristics similar to those of active myeloma, with possible implications for personalized therapies. Single cell analysis of rare circulating tumor cells allows for accurate liquid biopsy and detection of malignant plasma cells, which reflect bone marrow disease. Our work establishes single cell RNA sequencing for dissecting blood malignancies and devising detailed molecular characterization of tumor cells in symptomatic and asymptomatic patients.
-
(2018) Nature. 559, 7714, p. 400-404 Abstract
The incidence of acute myeloid leukaemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 65. Most cases arise without any detectable early symptoms and patients usually present with the acute complications of bone marrow failure1. The onset of such de novo AML cases is typically preceded by the accumulation of somatic mutations in preleukaemic haematopoietic stem and progenitor cells (HSPCs) that undergo clonal expansion2,3. However, recurrent AML mutations also accumulate in HSPCs during ageing of healthy individuals who do not develop AML, a phenomenon referred to as age-related clonal haematopoiesis (ARCH)4-8. Here we use deep sequencing to analyse genes that are recurrently mutated in AML to distinguish between individuals who have a high risk of developing AML and those with benign ARCH. We analysed peripheral blood cells from 95 individuals that were obtained on average 6.3 years before AML diagnosis (pre-AML group), together with 414 unselected age- and gender-matched individuals (control group). Pre-AML cases were distinct from controls and had more mutations per sample, higher variant allele frequencies, indicating greater clonal expansion, and showed enrichment of mutations in specific genes. Genetic parameters were used to derive a model that accurately predicted AML-free survival; this model was validated in an independent cohort of 29 pre-AML cases and 262 controls. Because AML is rare, we also developed an AML predictive model using a large electronic health record database that identified individuals at greater risk. Collectively our findings provide proof-of-concept that it is possible to discriminate ARCH from pre-AML many years before malignant transformation. This could in future enable earlier detection and monitoring, and may help to inform intervention.
-
(2018) Nature ecology & evolution. 2, 7, p. 1176-+ Abstract
A hallmark of metazoan evolution is the emergence of genomic mechanisms that implement cell-type-specific functions. However, the evolution of metazoan cell types and their underlying gene regulatory programmes remains largely uncharacterized. Here, we use whole-organism single-cell RNA sequencing to map cell-type-specific transcription in Porifera (sponges), Ctenophora (comb jellies) and Placozoa species. We describe the repertoires of cell types in these non-bilaterian animals, uncovering diverse instances of previously unknown molecular signatures, such as multiple types of peptidergic cells in Placozoa. Analysis of the regulatory programmes of these cell types reveals variable levels of complexity. In placozoans and poriferans, sequence motifs in the promoters are predictive of cell-type-specific programmes. By contrast, the generation of a higher diversity of cell types in ctenophores is associated with lower specificity of promoter sequences and the existence of distal regulatory elements. Our findings demonstrate that metazoan cell types can be defined by networks of transcription factors and proximal promoters, and indicate that further genome regulatory complexity may be required for more diverse cell type repertoires.
-
(2018) Nature Cell Biology. 20, 7, p. 836-+ Abstract
The dynamics of haematopoietic stem cell differentiation and the hierarchy of oligopotent stem cells in the bone marrow remain controversial. Here we dissect haematopoietic progenitor populations at single cell resolution, deriving an unbiased reference model of transcriptional states in normal and perturbed murine bone marrow. We define the signature of the naive haematopoietic stem cell and find a continuum of core progenitor states. Core cell populations mix transcription of pre-myeloid and prelymphoid programs, but do not mix erythroid or megakaryocyte programs with other fates. CRISP-seq perturbation analysis confirms our models and reveals that Cebpa regulates entry into all myeloid fates, while Irf8 and PU.1 deficiency block later differentiation towards monocyte or granulocyte fates. Our transcriptional map defines a reference network model for blood progenitors and their differentiation trajectories during normal and perturbed haematopoiesis.
-
(2018) Cell. 173, 6, p. 1520-+ Abstract
The emergence and diversification of cell types is a leading factor in animal evolution. So far, systematic characterization of the gene regulatory programs associated with cell type specificity was limited to few cell types and few species. Here, we perform whole-organism single-cell transcriptomics to map adult and larval cell types in the cnidarian Nematostella vectensis, a non-bilaterian animal with complex tissue-level body-plan organization. We uncover eight broad cell classes in Nematostella, including neurons, cnidocytes, and digestive cells. Each class comprises different subtypes defined by the expression of multiple specific markers. In particular, we characterize a surprisingly diverse repertoire of neurons, which comparative analysis suggests are the result of lineage-specific diversification. By integrating transcription factor expression, chromatin profiling, and sequence motif analysis, we identify the regulatory codes that underlie Nematostella cell-specific expression. Our study reveals cnidarian cell type complexity and provides insights into the evolution of animal cell-specific genomic regulation.
2017
-
(2017) Cell. 171, 3, p. 557-572.e24 Abstract
Chromosome conformation capture technologies have revealed important insights into genome folding. Yet, how spatial genome architecture is related to gene expression and cell fate remains unclear. We comprehensively mapped 3D chromatin organization during mouse neural differentiation in vitro and in vivo, generating the highest-resolution Hi-C maps available to date. We found that transcription is correlated with chromatin insulation and long-range interactions, but dCas9-mediated activation is insufficient for creating TAD boundaries de novo. Additionally, we discovered long-range contacts between gene bodies of exon-rich, active genes in all cell types. During neural differentiation, contacts between active TADs become less pronounced while inactive TADs interact more strongly. An extensive Polycomb network in stem cells is disrupted, while dynamic interactions between neural transcription factors appear in vivo. Finally, cell type-specific enhancer-promoter contacts are established concomitant to gene expression. This work shows that multiple factors influence the dynamics of chromatin interactions in development. An ultrahigh resolution Hi-C map of mouse neural differentiation yields insights into the multiple factors that influence the dynamics of chromatin interactions during development.
-
(2017) Nature. 547, 7661, p. 61-67 Abstract
Chromosomes in proliferating metazoan cells undergo marked structural metamorphoses every cell cycle, alternating between highly condensed mitotic structures that facilitate chromosome segregation, and decondensed interphase structures that accommodate transcription, gene silencing and DNA replication. Here we use single-cell Hi-C (high-resolution chromosome conformation capture) analysis to study chromosome conformations in thousands of individual cells, and discover a continuum of cis-interaction profiles that finely position individual cells along the cell cycle. We show that chromosomal compartments, topological-associated domains (TADs), contact insulation and long-range loops, all defined by bulk Hi-C maps, are governed by distinct cell-cycle dynamics. In particular, DNA replication correlates with a build-up of compartments and a reduction in TAD insulation, while loops are generally stable from G1 to S and G2 phase. Whole-genome three-dimensional structural models reveal a radial architecture of chromosomal compartments with distinct epigenomic signatures. Our single-cell data therefore allow re-interpretation of chromosome conformation maps through the prism of the cell cycle.
-
(2017) GENES & DEVELOPMENT. 31, 10, p. 959-972 Abstract
DNA methylation is a key regulator of embryonic stem cell (ESC) biology, dynamically changing between naive, primed, and differentiated states. The p53 tumor suppressor is a pivotal guardian of genomic stability, but its contributions to epigenetic regulation and stem cell biology are less explored. We report that, in naive mouse ESCs (mESCs), p53 restricts the expression of the de novo DNA methyltransferases Dnmt3a and Dnmt3b while up-regulating Tet1 and Tet2, which promote DNA demethylation. The DNA methylation imbalance in p53-deficient (p53(-/-)) mESCs is the result of augmented overall DNA methylation as well as increased methylation landscape heterogeneity. In differentiating p53(-/-) mESCs, elevated methylation persists, albeit more mildly. Importantly, concomitant with DNA methylation heterogeneity, p53(-/-) mESCs display increased cellular heterogeneity both in the "naive" state and upon induced differentiation. This impact of p53 loss on 5-methylcytosine (5mC) heterogeneity was also evident in human ESCs and mouse embryos in vivo. Hence, p53 helps maintain DNA methylation homeostasis and clonal homogeneity, a function that may contribute to its tumor suppressor activity.
-
(2017) Proceedings Of The National Academy Of Sciences Of The United States Of America-Physical Sciences. 114, 20, p. E4030-E4039 Abstract
Children with Down syndrome (DS) are prone to development of high-risk B-cell precursor ALL (DS-ALL), which differs genetically from most sporadic pediatric ALLs. Increased expression of cytokine receptor-like factor 2 (CRLF2), the receptor to thymic stromal lymphopoietin (TSLP), characterizes about half of DS-ALLs and also a subgroup of sporadic "Philadelphia-like" ALLs. To understand the pathogenesis of relapsed DS-ALL, we performed integrative genomic analysis of 25 matched diagnosis-remission and - relapse DSALLs. We found that the CRLF2 rearrangements are early events during DS-ALL evolution and generally stable between diagnoses and relapse. Secondary activating signaling events in the JAK-STAT/RAS pathway were ubiquitous but highly redundant between diagnosis and relapse, suggesting that signaling is essential but that no specific mutations are "relapse driving." We further found that activated JAK2 may be naturally suppressed in 25% of CRLF2(pos) DSALLs by loss-of-function aberrations in USP9X, a deubiquitinase previously shown to stabilize the activated phosphorylated JAK2. Interrogation of large ALL genomic databases extended our findings up to 25% of CRLF2(pos), Philadelphia-like ALLs. Pharmacological or genetic inhibition of USP9X, as well as treatment with low-dose ruxolitinib, enhanced the survival of pre-B ALL cells overexpressing mutated JAK2. Thus, somehow counterintuitive, we found that suppression of JAK-STAT "hypersignaling" may be beneficial to leukemic B-cell precursors. This finding and the reduction of JAK mutated clones at relapse suggest that the therapeutic effect of JAK specific inhibitors may be limited. Rather, combined signaling inhibitors or direct targeting of the TSLP receptor may be a useful therapeutic strategy for DS-ALL.
-
(2017) Nature. 541, 7637, p. 331-338 Abstract
Three of the most fundamental questions in biology are how individual cells differentiate to form tissues, how tissues function in a coordinated and flexible fashion and which gene regulatory mechanisms support these processes. Single-cell genomics is opening up new ways to tackle these questions by combining the comprehensive nature of genomics with the microscopic resolution that is required to describe complex multicellular systems. Initial single-cell genomic studies provided a remarkably rich phenomenology of heterogeneous cellular states, but transforming observational studies into models of dynamics and causal mechanisms in tissues poses fresh challenges and requires stronger integration of theoretical, computational and experimental frameworks.
2016
-
(2016) Nature. 540, 7632, p. 296-300 Abstract
Chromosomes are folded into highly compacted structures to accommodate physical constraints within nuclei and to regulate access to genomic information. Recently, global mapping of pairwise contacts showed that loops anchoring topological domains (TADs) are highly conserved between cell types and species. Whether pairwise loops synergize to form higher-order structures is still unclear. Here we develop a conformation capture assay to study higher-order organization using chromosomal walks (C-walks) that link multiple genomic loci together into proximity chains in human and mouse cells. This approach captures chromosomal structure at varying scales. Inter-chromosomal contacts constitute only 7-10% of the pairs and are restricted by interfacing TADs. About half of the C-walks stay within one chromosome, and almost half of those are restricted to intra-TAD spaces. C-walks that couple 2-4 TADs indicate stochastic associations between transcriptionally active, early replicating loci. Targeted analysis of thousands of 3-walks anchored at highly expressed genes support pairwise, rather than hub-like, chromosomal topology at active loci. Polycomb-repressed Hox domains are shown by the same approach to enrich for synergistic hubs. Together, the data indicate that chromosomal territories, TADs, and intra-TAD loops are primarily driven by nested, possibly dynamic, pairwise contacts.
-
(2016) Cell. 167, 7, p. 1883-1896.e15 Abstract
In multicellular organisms, dedicated regulatory circuits control cell type diversity and responses. The crosstalk and redundancies within these circuits and substantial cellular heterogeneity pose a major research challenge. Here, we present CRISP-seq, an integrated method for massively parallel single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-pooled screens. We show that profiling the genomic perturbation and transcriptome in the same cell enables us to simultaneously elucidate the function of multiple factors and their interactions. We applied CRISP-seq to probe regulatory circuits of innate immunity. By sampling tens of thousands of perturbed cells in vitro and in mice, we identified interactions and redundancies between developmental and signaling-dependent factors. These include opposing effects of Cebpb and Irf8 in regulating the monocyte/macrophage versus dendritic cell lineages and differential functions for Rela and Stat1/2 in monocyte versus dendritic cell responses to pathogens. This study establishes CRISP-seq as a broadly applicable, comprehensive, and unbiased approach for elucidating mammalian regulatory circuits.
-
(2016) PLoS Genetics. 12, 11, e1006330. Abstract
The development of niches for tissue-specific stem cells is an important aspect of stem cell biology. Determination of niche size and niche numbers during organogenesis involves precise control of gene expression. How this is achieved in the context of a complex chromatin landscape is largely unknown. Here we show that the nuclear protein Combgap (Cg) supports correct ovarian niche formation in Drosophila by controlling ecdysone-Receptor (EcR)- mediated transcription and long-range chromatin contacts in the broad locus (BR-C). Both cg and BR-C promote ovarian growth and the development of niches for germ line stem cells. BR-C levels were lower when Combgap was either reduced or over-expressed, indicating an intricate regulation of the BR-C locus by Combgap. Polytene chromosome stains showed that Cg co-localizes with EcR, the major regulator of BR-C, at the BR-C locus and that EcR binding to chromatin was sensitive to changes in Cg levels. Proximity ligation assay indicated that the two proteins could reside in the same complex. Finally, chromatin conformation analysis revealed that EcR-bound regions within BR-C, which span ~30 KBs, contacted each other. Significantly, these contacts were stabilized in an ecdysone- and Combgap-dependent manner. Together, these results highlight Combgap as a novel regulator of chromatin structure that promotes transcription of ecdysone target genes and ovarian niche formation.
-
(2016) Cell. 166, 5, p. 1231-1246.e13 Abstract
Innate lymphoid cells (ILCs) are critical modulators of mucosal immunity, inflammation, and tissue homeostasis, but their full spectrum of cellular states and regulatory landscapes remains elusive. Here, we combine genome-wide RNA-seq, ChIP-seq, and ATAC-seq tocompare the transcriptional and epigenetic identityof small intestinal ILCs, identifying thousands ofdistinct gene profiles and regulatory elements. Single-cell RNA-seq and flow and mass cytometry analyses reveal compartmentalization of cytokine expression and metabolic activity within the three classical ILC subtypes and highlight transcriptional states beyond the current canonical classification. In addition, using antibiotic intervention and germ-free mice, we characterize the effect of the microbiome on the ILC regulatory landscape and determine the response of ILCs to microbial colonization at the single-cell level. Together, our work characterizes the spectrum of transcriptional identities of small intestinal ILCs and describes how ILCs differentially integrate signals from the microbial microenvironment to generate phenotypic and functional plasticity.
-
(2016) Nucleic Acids Research. 44, 9, p. 4222-4232 Abstract
Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations.
2015
-
(2015) Nature protocols. 10, 12, p. 1986-2003 Abstract
Hi-C is a powerful method that provides pairwise information on genomic regions in spatial proximity in the nucleus. Hi-C requires millions of cells as input and, as genome organization varies from cell to cell, a limitation of Hi-C is that it only provides a population average of genome conformations. We developed single-cell Hi-C to create snapshots of thousands of chromatin interactions that occur simultaneously in a single cell. To adapt Hi-C to single-cell analysis, we modified the protocol to include in-nucleus ligation. This enables the isolation of single nuclei carrying Hi-C-ligated DNA into separate tubes, followed by reversal of cross-links, capture of biotinylated ligation junctions on streptavidin-coated magnetic beads and PCR amplification of single-cell Hi-C libraries. The entire laboratory protocol can be carried out in 1 week, and although we have demonstrated its use in mouse T helper (T(H)1) cells, it should be applicable to any cell type or species for which standard Hi-C has been successful. We also developed an analysis pipeline to filter noise and assess the quality of data sets in a few hours. Although the interactome maps produced by single-cell Hi-C are sparse, the data provide useful information to understand cellular variability in nuclear genome organization and chromosome structure. Standard wet and dry laboratory skills in molecular biology and computational analysis are required.
-
(2015) Cell. 163, 7, p. 1663-1677 Abstract
Within the bone marrow, stem cells differentiate and give rise to diverse blood cell types and functions. Currently, hematopoietic progenitors are defined using surface markers combined with functional assays that are not directly linked with in vivo differentiation potential or gene regulatory mechanisms. Here, we comprehensively map myeloid progenitor sub-populations by transcriptional sorting of single cells from the bone marrow. We describe multiple progenitor subgroups, showing unexpected transcriptional priming toward seven differentiation fates but no progenitors with a mixed state. Transcriptional differentiation is correlated with combinations of known and previously undefined transcription factors, suggesting that the process is tightly regulated. Histone maps and knockout assays are consistent with early transcriptional priming, while traditional transplantation experiments suggest that in vivo priming may still allow for plasticity given strong perturbations. These data establish a reference model and general framework for studying hematopoiesis at single-cell resolution.
-
(2015) Nature Reviews Genetics. 16, 12, p. 716-726 Abstract
Epigenomics is the study of the physical modifications, associations and conformations of genomic DNA sequences, with the aim of linking these with epigenetic memory, cellular identity and tissue-specific functions. While current techniques in the field are characterizing the average epigenomic features across large cell ensembles, the increasing interest in the epigenetics within complex and heterogeneous tissues is driving the development of single-cell epigenomics. We review emerging single-cell methods for capturing DNA methylation, chromatin accessibility, histone modifications, chromosome conformation and replication dynamics. Together, these techniques are rapidly becoming a powerful tool in studies of cellular plasticity and diversity, as seen in stem cells and cancer.
-
(2015) Cell Reports. 10, 8, p. 1297-1309 Abstract
Topological domains are key architectural building blocks of chromosomes, but their functional importance and evolutionary dynamics are not well defined. We performed comparative high-throughput chromosome conformation capture (Hi-C) in four mammals and characterized the conservation and divergence of chromosomal contact insulation and the resulting domain architectures within distantly related genomes. We show that the modular organization of chromosomes is robustly conserved in syntenic regions and that this is compatible with conservation of the binding landscape of the insulator protein CTCF. Specifically, conserved CTCF sites are co-localized with cohesin, are enriched at strong topological domain borders, and bind to DNA motifs with orientations that define the directionality of CTCF's long-range interactions. Conversely, divergent CTCF binding between species is correlated with divergence of internal domain structure, likely driven by local CTCF binding sequence changes, demonstrating how genome evolution can be linked to a continuous flux of local conformation changes. We also show that large-scale domains are reorganized during genome evolution as intact modules.
2014
-
(2014) Cell Reports. 9, 1, p. 219-233 Abstract
Metazoan genomes are partitioned into modular chromosomal domains containing active or repressive chromatin. In flies, Polycomb group (PcG) response elements (PREs) recruit PHO and other DNA-binding factors and act as nucleation sites for the formation of Polycomb repressive domains. The sequence specificity of PREs is not well understood. Here, we use comparative epigenomics and transgenic assays to show that Drosophila domain organization and PRE specification are evolutionarily conserved despite significant cis-element divergence within Polycomb domains, whereas cis-element evolution is strongly correlated with transcription factor binding divergence outside of Polycomb domains. Cooperative interactions of PcG complexes and their recruiting factor PHO stabilize PHO recruitment to low-specificity sequences. Consistently, PHO recruitment to sites within Polycomb domains is stabilized by PRC1. These data suggest that cooperative rather than hierarchical interactions among low-affinity sequences, DNA-binding factors, and the Polycomb machinery are giving rise to specific and strongly conserved 3D structures in Drosophila.
-
(2014) Cell Reports. 8, 3, p. 798-806 Abstract
Despite much evidence on epigenetic abnormalities in cancer, it is currently unclear to what extent epigenetic alterations can be associated with tumors' clonal genetic origins. Here, we show that the prostate intratumor heterogeneity in DNA methylation and copy-number patterns can be explained by a unified evolutionary process. By assaying multiple topographically distinct tumor sites, premalignant lesions, and lymph node metastases within five cases of prostate cancer, we demonstrate that both DNA methylation and copy-number heterogeneity consistently reflect the life history of the tumors. Furthermore, we show cases of genetic or epigenetic convergent evolution and highlight the diversity in the evolutionary origins and aberration spectrum between tumor and metastatic subclones. Importantly, DNA methylation can complement genetic data by serving as a proxy for activity at regulatory domains, as we show through identification of high epigenetic heterogeneity at androgen-receptor- bound enhancers. Epigenome variation thereby expands on the current genome-centric view on tumor heterogeneity.
-
(2014) Science. 343, 6172, p. 776-779 Abstract
In multicellular organisms, biological function emerges when heterogeneous cell types form complex organs. Nevertheless, dissection of tissues into mixtures of cellular subpopulations is currently challenging. We introduce an automated massively parallel single-cell RNA sequencing (RNA-seq) approach for analyzing in vivo transcriptional states in thousands of single cells. Combined with unsupervised classification algorithms, this facilitates ab initio cell-type characterization of splenic tissues. Modeling single-cell transcriptional states in dendritic cells and additional hematopoietic cell types uncovers rich cell-type heterogeneity and gene-modules activity in steady state and after pathogen activation. Cellular diversity is thereby approached through inference of variable and dynamic pathway activity rather than a fixed preprogrammed cell-type hierarchy. These data demonstrate single-cell RNA-seq as an effective tool for comprehensive cellular decomposition of complex tissues.
-
(2014) Nature. 513, 7516, p. 115-119 Abstract
Stable maintenance of gene regulatory programs is essential for normal function in multicellular organisms. Epigenetic mechanisms, and DNA methylation in particular, are hypothesized to facilitate such maintenance by creating cellular memory that can be written during embryonic development and then guide cell-type-specific gene expression. Here we develop new methods for quantitative inference of DNA methylation turnover rates, and show that human embryonic stem cells preserve their epigenetic state by balancing antagonistic processes that add and remove methylation marks rather than by copying epigenetic information from mother to daughter cells. In contrast, somatic cells transmit considerable epigenetic information to progenies. Paradoxically, the persistence of the somatic epigenome makes it more vulnerable to noise, since random epimutations can accumulate to massively perturb the epigenomic ground state. The rate of epigenetic perturbation depends on the genomic context, and, in particular, DNA methylation loss is coupled to late DNA replication dynamics. Epigenetic perturbation is not observed in the pluripotent state, because the rapid turnover-based equilibrium continuously reinforces the canonical state. This dynamic epigenetic equilibrium also explains how the epigenome can be reprogrammed quickly and to near perfection after induced pluripotency.
2013
-
(2013) EMBO Journal. 32, 24, p. 3119-3129 Abstract
To ensure proper gene regulation within constrained nuclear space, chromosomes facilitate access to transcribed regions, while compactly packaging all other information. Recent studies revealed that chromosomes are organized into megabase-scale domains that demarcate active and inactive genetic elements, suggesting that compartmentalization is important for genome function. Here, we show that very specific long-range interactions are anchored by cohesin/CTCF sites, but not cohesin-only or CTCF-only sites, to form a hierarchy of chromosomal loops. These loops demarcate topological domains and form intricate internal structures within them. Post-mitotic nuclei deficient for functional cohesin exhibit global architectural changes associated with loss of cohesin/CTCF contacts and relaxation of topological domains. Transcriptional analysis shows that this cohesin-dependent perturbation of domain organization leads to widespread gene deregulation of both cohesin-bound and non-bound genes. Our data thereby support a role for cohesin in the global organization of domain structure and suggest that domains function to stabilize the transcriptional programmes within them.
-
(2013) Cell Reports. 4, 6, p. 1131-1143 Abstract
The t(8;21) and inv(16) chromosomal aberrations generate the oncoproteins AML1-ETO (A-E) and CBFβ-SMMHC (C-S). The role of these oncoproteins in acute myeloid leukemia (AML) etiology has been well studied. Conversely, the function of native RUNX1 in promoting A-E- and C-S-mediated leukemias has remained elusive. We show that wild-type RUNX1 is required for the survival of t(8;21)-Kasumi-1 and inv(16)-ME-1 leukemic cells. RUNX1 knockdown in Kasumi-1 cells (Kasumi-1RX1-KD) attenuates the cell-cycle mitotic checkpoint, leading to apoptosis, whereas knockdown of A-E in Kasumi-1RX1-KD rescues these cells. Mechanistically, a delicate RUNX1/A-E balance involving competition for common genomic sites that regulate RUNX1/A-E targets sustains the malignant cell phenotype. The broad medical significance of this leukemic cell addiction to native RUNX1 is underscored by clinical data showing that an active RUNX1 allele is usually preserved in both t(8;21) or inv(16) AML patients, whereas RUNX1 is frequently inactivated in other forms of leukemia. Thus, RUNX1 and its mitotic control targets are potential candidates for new therapeutic approaches
-
(2013) Nature Genetics. 45, 7, p. 717-718 Abstract
Transposable elements (TEs) make up 50% of the human genome and are usually considered a mutational burden. A new study uses signatures of DNA hypomethylation to identify tissue-specific enhancers within TEs, providing fresh evidence that mobile DNA has a non-negligible role in genome regulation and evolution.
-
(2013) PLoS Genetics. 9, 5, e1003512. Abstract
Modern functional genomics uncovered numerous functional elements in metazoan genomes. Nevertheless, only a small fraction of the typical non-exonic genome contains elements that code for function directly. On the other hand, a much larger fraction of the genome is associated with significant evolutionary constraints, suggesting that much of the non-exonic genome is weakly functional. Here we show that in flies, local (30-70 bp) conserved sequence elements that are associated with multiple regulatory functions serve as focal points to a pattern of punctuated regional increase in G/C nucleotide frequencies. We show that this pattern, which covers a region tenfold larger than the conserved elements themselves, is an evolutionary consequence of a shift in the balance between gain and loss of G/C nucleotides and that it is correlated with nucleosome occupancy across multiple classes of epigenetic state. Evidence for compensatory evolution and analysis of SNP allele frequencies show that the evolutionary regime underlying this balance shift is likely to be non-neutral. These data suggest that current gaps in our understanding of genome function and evolutionary dynamics are explicable by a model of sparse sequence elements directly encoding for function, embedded into structural sequences that help to define the local and global epigenomic context of such functional elements.
-
(2013) PLoS ONE. 8, 5, e64248. Abstract
RUNX1 transcription factor (TF) is a key regulator of megakaryocytic development and when mutated is associated with familial platelet disorder and predisposition to acute myeloid leukemia (FPD-AML). We used mice lacking Runx1 specifically in megakaryocytes (MK) to characterized Runx1-mediated transcriptional program during advanced stages of MK differentiation. Gene expression and chromatin-immunoprecipitation-sequencing (ChIP-seq) of Runx1 and p300 identified functional Runx1 bound MK enhancers. Runx1/p300 co-bound regions showed significant enrichment in genes important for MK and platelet homeostasis. Runx1 occupied genomic regions were highly enriched in RUNX and ETS motifs and to a lesser extent in GATA motif. Megakaryocytic specificity of Runx1/P300 bound enhancers was validated by transfection mutagenesis and Runx1/P300 co-bound regions of two key megakaryocytic genes Nfe2 and Selp were tested by in vivo transgenesis. The data provides the first example of genome wide Runx1/p300 occupancy in maturating primary FL-MK, unravel the Runx1-regulated program controlling MK maturation in vivo and identify a subset of its bona fide regulated genes. It advances our understanding of the molecular events that upon RUNX1mutations in human lead to the predisposition to familial platelet disorders and FPD-AML.
-
Chromosomal domains: Epigenetic contexts and functional implications of genomic compartmentalization(2013) Current opinion in genetics & development. 23, 2, p. 197-203 Abstract
We review recent developments in mapping chromosomal contacts and compare emerging insights on chromosomal contact domain organization in Drosophila and mammalian cells. Potential scenarios leading to the observation of Hi-C domains and their association with the epigenomic context of the chromosomal elements involved are discussed. We argue that even though the mechanisms and precise physical structure underlying chromosomal domain demarcation are yet to be fully resolved, the implications to genome regulation and genome evolution are profound. Specifically, we hypothesize that domains are facilitating genomic compartmentalization that support the implementation of complex, modular, and tissue specific transcriptional program in metazoa.
-
(2013) Nature. 502, 7469, p. 65-70 Abstract
Somatic cells can be inefficiently and stochastically reprogrammed into induced pluripotent stem (iPS) cells by exogenous expression of Oct4 (also called Pou5f1), Sox2, Klf4 and Myc (hereafter referred to as OSKM). The nature of the predominant rate-limiting barrier(s) preventing the majority of cells to successfully and synchronously reprogram remains to be defined. Here we show that depleting Mbd3, a core member of the Mbd3/NuRD (nucleosome remodelling and deacetylation) repressor complex, together with OSKM transduction and reprogramming in naive pluripotency promoting conditions, result in deterministic and synchronized iPS cell reprogramming (near 100% efficiency within seven days from mouse and human cells). Our findings uncover a dichotomous molecular function for the reprogramming factors, serving to reactivate endogenous pluripotency networks while simultaneously directly recruiting the Mbd3/NuRD repressor complex that potently restrains the reactivation of OSKM downstream target genes. Subsequently, the latter interactions, which are largely depleted during early pre-implantation development in vivo, lead to a stochastic and protracted reprogramming trajectory towards pluripotency in vitro. The deterministic reprogramming approach devised here offers a novel platform for the dissection of molecular dynamics leading to establishing pluripotency at unprecedented flexibility and resolution.
-
(2013) Nature. 502, 7469, p. 59-64 Abstract
Large-scale chromosome structure and spatial nuclear arrangement have been linked to control of gene expression and DNA replication and repair. Genomic techniques based on chromosome conformation capture (3C) assess contacts for millions of loci simultaneously, but do so by averaging chromosome conformations from millions of nuclei. Here we introduce single-cell Hi-C, combined with genome-wide statistical analysis and structural modelling of single-copy X chromosomes, to show that individual chromosomes maintain domain organization at the megabase scale, but show variable cell-to-cell chromosome structures at larger scales. Despite this structural stochasticity, localization of active gene domains to boundaries of chromosome territories is a hallmark of chromosomal conformation. Single-cell Hi-C data bridge current gaps between genomics and microscopy studies of chromosomes, demonstrating how modular organization underlies dynamic chromosome structure, and how this structure is probabilistically linked with genome activity patterns.
-
(2013) Nature. 504, 7479, p. 282-286 Abstract
Mouse embryonic stem (ES) cells are isolated from the inner cell mass of blastocysts, and can be preserved in vitro in a naive inner-cell-mass-like configuration by providing exogenous stimulation with leukaemia inhibitory factor (LIF) and small molecule inhibition of ERK1/ERK2 and GSK3β signalling (termed 2i/LIF conditions). Hallmarks of naive pluripotency include driving Oct4 (also known as Pou5f1) transcription by its distal enhancer, retaining a pre-inactivation X chromosome state, and global reduction in DNA methylation and in H3K27me3 repressive chromatin mark deposition on developmental regulatory gene promoters. Upon withdrawal of 2i/LIF, naive mouse ES cells can drift towards a primed pluripotent state resembling that of the post-implantation epiblast. Although human ES cells share several molecular features with naive mouse ES cells, they also share a variety of epigenetic properties with primed murine epiblast stem cells (EpiSCs). These include predominant use of the proximal enhancer element to maintain OCT4 expression, pronounced tendency for X chromosome inactivation in most female human ES cells, increase in DNA methylation and prominent deposition of H3K27me3 and bivalent domain acquisition on lineage regulatory genes. The feasibility of establishing human ground state naive pluripotency in vitro with equivalent molecular and functional features to those characterized in mouse ES cells remains to be defined. Here we establish defined conditions that facilitate the derivation of genetically unmodified human naive pluripotent stem cells from already established primed human ES cells, from somatic cells through induced pluripotent stem (iPS) cell reprogramming or directly from blastocysts. The novel naive pluripotent cells validated herein retain molecular characteristics and functional properties that are highly similar to mouse naive ES cells, and distinct from conventional primed human pluripotent cells. This includes competence in the generation of cross-species chimaeric mouse embryos that underwent organogenesis following microinjection of human naive iPS cells into mouse morulas. Collectively, our findings establish new avenues for regenerative medicine, patient-specific iPS cell disease modelling and the study of early human development in vitro and in vivo.
2012
-
(2012) Nature Genetics. 44, 11, p. 1207-1214 Abstract
DNA methylation has been comprehensively profiled in normal and cancer cells, but the dynamics that form, maintain and reprogram differentially methylated regions remain enigmatic. Here, we show that methylation patterns within populations of cells from individual somatic tissues are heterogeneous and polymorphic. Using in vitro evolution of immortalized fibroblasts for over 300 generations, we track the dynamics of polymorphic methylation at regions developing significant differential methylation on average. The data indicate that changes in population-averaged methylation occur through a stochastic process that generates a stream of local and uncorrelated methylation aberrations. Despite the stochastic nature of the process, nearly deterministic epigenetic remodeling emerges on average at loci that lose or gain resistance to methylation accumulation. Changes in the susceptibility to methylation accumulation are correlated with changes in histone modification and CTCF occupancy. Characterizing epigenomic polymorphism within cell populations is therefore critical to understanding methylation dynamics in normal and cancer cells.
-
(2012) Nature Methods. 9, 10, p. 969-972 Abstract
Regulatory DNA elements can control the expression of distant genes via physical interactions. Here we present a cost-effective methodology and computational analysis pipeline for robust characterization of the physical organization around selected promoters and other functional elements using chromosome conformation capture combined with high-throughput sequencing (4C-seq). Our approach can be multiplexed and routinely integrated with other functional genomics assays to facilitate physical characterization of gene regulation.
-
(2012) Molecular Biology and Evolution. 29, 7, p. 1769-1780 Abstract
Nucleotide substitution is a major evolutionary driving force that can incrementally and stochastically give rise to broad divergence patterns among species. The substitution process at each genomic position is frequently modeled independently of the other positions, although complex interactions between nearby bases are known to significantly affect mutation rates. Here, we study the evolution of 12 fly genomes using new algorithms for accurate inference of parameter-rich substitution models. By comparing models between lineages, we reveal the evolutionary histories of substitution rates at different flanking nucleotide contexts. We demonstrate these driving forces of molecular evolution to be constantly changing, suggesting that neutral drift of mutation rates is an important factor in the evolution of genomes and their sequence composition. This observation is used to develop a scalable approach for parameter-rich comparative genomics. By screening short DNA sequences, we demonstrate how homeoboxes and other transcription factor binding motifs are highly conserved based on our parameter-rich models but not according to standard conservation assays. With the increasing availability of genome sequences, rich substitution models become an attractive and practical approach for evolutionary analysis in general and comparative genomics in particular.
-
(2012) Cell. 148, 3, p. 458-472 Abstract
Chromosomes are the physical realization of genetic information and thus form the basis for its readout and propagation. Here we present a high-resolution chromosomal contact map derived from a modified genome-wide chromosome conformation capture approach applied to Drosophila embryonic nuclei. The data show that the entire genome is linearly partitioned into well-demarcated physical domains that overlap extensively with active and repressive epigenetic marks. Chromosomal contacts are hierarchically organized between domains. Global modeling of contact density and clustering of domains show that inactive domains are condensed and confined to their chromosomal territories, whereas active domains reach out of the territory to form remote intra- and interchromosomal contacts. Moreover, we systematically identify specific long-range intrachromosomal contacts between Polycomb-repressed domains. Together, these observations allow for quantitative prediction of the Drosophila chromosomal contact map, laying the foundation for detailed studies of chromosome structure and function in a genetically tractable system.
2011
-
(2011) Nature Genetics. 43, 11, p. 1059-1065 Abstract
Hi-C experiments measure the probability of physical proximity between pairs of chromosomal loci on a genomic scale. We report on several systematic biases that substantially affect the Hi-C experimental procedure, including the distance between restriction sites, the GC content of trimmed ligation junctions and sequence uniqueness. To address these biases, we introduce an integrated probabilistic background model and develop algorithms to estimate its parameters and renormalize Hi-C data. Analysis of corrected human lymphoblast contact maps provides genome-wide evidence for interchromosomal aggregation of active chromatin marks, including DNase-hypersensitive sites and transcriptionally active foci. We observe extensive long-range (up to 400 kb) cis interactions at active promoters and derive asymmetric contact profiles next to transcription start sites and CTCF binding sites. Clusters of interacting chromosomal domains suggest physical separation of centromere-proximal and centromere-distal regions. These results provide a computational basis for the inference of chromosomal architectures from Hi-C experiments.
-
Primate CpG islands are maintained by heterogeneous evolutionary regimes involving minimal selection(2011) Cell. 145, 5, p. 773-786 Abstract
Mammalian CpG islands are key epigenomic elements that were first characterized experimentally as genomic fractions with low levels of DNA methylation. Currently, CpG islands are defined based on their genomic sequences alone. Here, we develop evolutionary models to show that several distinct evolutionary processes generate and maintain CpG islands. One central evolutionary regime resulting in enriched CpG content is driven by low levels of DNA methylation and consequentially low rates of CpG deamination. Another major force forming CpG islands is biased gene conversion that stabilizes constitutively methylated CpG islands by balancing rapid deamination with CpG fixation. Importantly, evolutionary analysis and population genetics data suggest that selection for high CpG content is not a significant factor contributing to conservation of CpGs in differentially methylated regions. The heterogeneous, but not selective, origins of CpG islands have direct implications for the understanding of DNA methylation patterns in healthy and diseased cells.
-
(2011) Blood. 117, 1, p. 1-14 Abstract
Specific interactions of transcription factors (TFs) with their targets are crucial for specifying gene expression programs during cell differentiation. How specificity is maintained despite limited selectivity of individual TF-DNA interactions is not fully understood. RUNX1 TF is among the most frequently mutated genes in human leukemia and an important regulator of megakaryopoiesis. We used megakaryocytic cell lines to characterize the network of RUNX1 targets and cooperating TFs in differentiating megakaryocytes and demonstrated how dynamic partnerships between RUNX1 and cooperating TFs facilitated regulatory plasticity and specificity during this process. After differentiation onset, RUNX1 directly activated a large number of genes through interaction with preexisting and de novo binding sites. Recruitment of RUNX1 to de novo occupied sites occurred at H3K4me1-marked preprogrammed enhancers. A significant number of these de novo bound sites lacked RUNX motif but were occupied by AP-1 TFs. Reciprocally, AP-1 TFs were up-regulated by RUNX1 after 12-O- tetradecanoylphorbol-13-acetate induction and recruited to RUNX1-occupied sites lacking AP-1 motifs. At other differentiation stages, additional combinatorial interactions occurred between RUNX1 and its coregulators, GATA1 and ETS. The findings suggest that in differentiating megakaryocytic cell lines, RUNX1 cooperates with GATA1, AP-1, and ETS to orchestrate cell-specific transcription programs through dynamic TF partnerships.
2010
-
(2010) PLoS Computational Biology. 6, 12, e1001039. Abstract
Evolution maintains organismal fitness by preserving genomic information. This is widely assumed to involve conservation of specific genomic loci among species. Many genomic encodings are now recognized to integrate small contributions from multiple genomic positions into quantitative dispersed codes, but the evolutionary dynamics of such codes are still poorly understood. Here we show that in yeast, sequences that quantitatively affect nucleosome occupancy evolve under compensatory dynamics that maintain heterogeneous levels of A+T content through spatially coupled A/T-losing and A/Tgaining substitutions. Evolutionary modeling combined with data on yeast polymorphisms supports the idea that these substitution dynamics are a consequence of weak selection. This shows that compensatory evolution, so far believed to affect specific groups of epistatically linked loci like paired RNA bases, is a widespread phenomenon in the yeast genome, affecting the majority of intergenic sequences in it. The model thus derived suggests that compensation is inevitable when evolution conserves quantitative and dispersed genomic functions.
-
(2010) Molecular Cell. 39, 6, p. 901-911 Abstract
Profound chromatin changes occur during mitosis to allow for gene silencing and chromosome segregation followed by reactivation of memorized transcription states in daughter cells. Using genome-wide sequencing, we found H2A.Z-containing +1 nucleosomes of active genes shift upstream to occupy TSSs during mitosis, significantly reducing nucleosome-depleted regions. Single-molecule analysis confirmed nucleosome shifting and demonstrated that mitotic shifting is specific to active genes that are silenced during mitosis and, thus, is not seen on promoters, which are silenced by methylation or mitotically expressed genes. Using the GRP78 promoter as a model, we found H3K4 trimethylation is also maintained while other indicators of active chromatin are lost and expression is decreased. These key changes provide a potential mechanism for rapid silencing and reactivation of genes during the cell cycle.
-
(2010) PLoS Genetics. 6, 7, p. 1-12 Abstract
Recent evidence suggests that the timing of DNA replication is coordinated across megabase-scale domains in metazoan genomes, yet the importance of this aspect of genome organization is unclear. Here we show that replication timing is remarkably conserved between human and mouse, uncovering large regions that may have been governed by similar replication dynamics since these species have diverged. This conservation is both tissue-specific and independent of the genomic G+C content conservation. Moreover, we show that time of replication is globally conserved despite numerous large-scale genome rearrangements. We systematically identify rearrangement fusion points and demonstrate that replication time can be locally diverged at these loci. Conversely, rearrangements are shown to be correlated with early replication and physical chromosomal proximity. These results suggest that large chromosomal domains of coordinated replication are shuffled by evolution while conserving the large-scale nuclear architecture of the genome.
-
(2010) Nature protocols. 5, 2, p. 303-322 Abstract
A major challenge in the analysis of gene expression microarray data is to extract meaningful biological knowledge out of the huge volume of raw data. Expander (EXPression ANalyzer and DisplayER) is an integrated software platform for the analysis of gene expression data, which is freely available for academic use. It is designed to support all the stages of microarray data analysis, from raw data normalization to inference of transcriptional regulatory networks. The microarray analysis described in this protocol starts with importing the data into Expander 5.0 and is followed by normalization and filtering. Then, clustering and network-based analyses are performed. The gene groups identified are tested for enrichment in function (based on Gene Ontology), co-regulation (using transcription factor and microRNA target predictions) or co-location. The results of each analysis step can be visualized in a number of ways. The complete protocol can be executed in ∼1 h.
-
(2010) Cell Cycle. 9, 2, p. 256-259 Abstract
A primary goal of genetic association studies is to elucidate genes and novel biological mechanisms involved in disease. Recently, genome-wide association studies have identified many common genetic variants that are significantly associated with complex diseases such as cancer. In contrast to Mendelian disorders, a sizable fraction of the variants lies outside known protein-coding regions; therefore, understanding their biological consequences presents a major challenge in human genetics. Here we describe an integrated framework to allow non-protein coding loci to be annotated with respect to regulatory functions. This will facilitate identification of target genes as well as prioritize variants for functional testing.
2009
-
(2009) Genome Research. 19, 12, p. 2193-2201 Abstract
DNA methylation is an important epigenetic mechanism, affecting normal development and playing a key role in reprogramming epigenomes during stem cell derivation. Here we report on DNA methylation patterns in native monkey embryonic stem cells (ESCs), fibroblasts, and ESCs generated through somatic cell nuclear transfer (SCNT), identifying and comparing epigenome programming and reprogramming. We characterize hundreds of regions that are hyper- or hypomethylated in fibroblasts compared to native ESCs and show that these are conserved in human cells and tissues. Remarkably, the vast majority of these regions are reprogrammed in SCNT ESCs, leading to almost perfect correlation between the epigenomic profiles of the native and reprogrammed lines. At least 58% of these changes are correlated in cis to transcription changes, Polycomb Repressive Complex-2 occupancy, or binding by the CTCF insulator. We also show that while epigenomic reprogramming is extensive and globally accurate, the efficiency of adding and stripping DNA methylation during reprogramming is regionally variable. In several cases, this variability results in regions that remain methylated in a fibroblast-like pattern even after reprogramming.
-
(2009) PLoS Genetics. 5, 8, e1000597. Abstract
Multiple discrete regions at 8q24 were recently shown to contain alleles that predispose to many cancers including prostate, breast, and colon. These regions are far from any annotated gene and their biological activities have been unknown. Here we profiled a 5-megabase chromatin segment encompassing all the risk regions for RNA expression, histone modifications, and locations occupied by RNA polymerase II and androgen receptor (AR). This led to the identification of several transcriptional enhancers, which were verified using reporter assays. Two enhancers in one risk region were occupied by AR and responded to androgen treatment; one contained a single nucleotide polymorphism (rs11986220) that resides within a FoxA1 binding site, with the prostate cancer risk allele facilitating both stronger FoxA1 binding and stronger androgen responsiveness. The study reported here exemplifies an approach that may be applied to any risk-associated allele in nonprotein coding regions as it emerges from genome-wide association studies to better understand the genetic predisposition of complex diseases.
-
(2009) PLoS Biology. 7, 1, e1000013. Abstract
Polycomb group (PcG) and trithorax group (trxG) proteins are conserved chromatin factors that regulate key developmental genes throughout development. In Drosophila, PcG and trxG factors bind to regulatory DNA elements called PcG and trxG response elements (PREs and TREs). Several DNA binding proteins have been suggested to recruit PcG proteins to PREs, but the DNA sequences necessary and sufficient to define PREs are largely unknown. Here, we used chromatin immunoprecipitation (ChIP) on chip assays to map the chromosomal distribution of Drosophila PcG proteins, the N- and C-terminal fragments of the Trithorax (TRX) protein and four candidate DNA-binding factors for PcG recruitment. In addition, we mapped histone modifications associated with PcG-dependent silencing and TRX-mediated activation. PcG proteins colocalize in large regions that may be defined as polycomb domains and colocalize with recruiters to form several hundreds of putative PREs. Strikingly, the majority of PcG recruiter binding sites are associated with H3K4me3 and not with PcG binding, suggesting that recruiter proteins have a dual function in activation as well as silencing. One major discriminant between activation and silencing is the strong binding of Pleiohomeotic (PHO) to silenced regions, whereas its homolog Pleiohomeotic-like (PHOL) binds preferentially to active promoters. In addition, the C-terminal fragment of TRX (TRX-C) showed high affinity to PcG binding sites, whereas the N-terminal fragment (TRX-N) bound mainly to active promoter regions trimethylated on H3K4. Our results indicate that DNA binding proteins serve as platforms to assist PcG and trxG binding. Furthermore, several DNA sequence features discriminate between PcG- and TRX-N-bound regions, indicating that underlying DNA sequence contains critical information to drive PREs and TREs towards silencing or activation.
-
Spatial Clustering of Multivariate Genomic and Epigenomic Information(2009) RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS. 5541, p. 170-183 Abstract
The combination of fully sequence genomes and new technologies for high density arrays and ultra-rapid sequencing enables the mapping of gene-regulatory and epigenetics marks on a global scale. This new experimental methodology was recently applied to map multiple historic marks and genomic factors, characterizing patterns of genome organization and discovering interactions among processes of epigenetic reprogramming during cellular differentiation. The new data poses a significant computational challenge in both size and statistical heterogeneity. Understanding it collectively and without bias remains an open problem. Here we introduce spatial clustering - a new unsupervised clustering methodology for dissection of large, multi-track genomic and epigenomic data sets into a spatially organized set of distinct combinatorial behaviors. We develop a probabilistic algorithm that Finds spatial clustering solutions by learning an HMM model and inferring the most likely genomic layout of clusters. Application of our methods to meta-analysis of combined ChIP-seq and ChIP-chip epigenomic datasets in mouse and human reveals known and novel patterns of local co-occurrence among historic modification and related factors. Moreover, the model weaves together these local patterns into a coherent global model that reflects the higher level organization of the epigenome. Spatial clustering constitutes a powerful and scalable analysis methodology for dissecting even larger scale genomic dataset that will soon become available.
2008
-
(2008) Proceedings of the National Academy of Sciences of the United States of America. 105, 35, p. 12979-12984 Abstract
Epigenetic reprogramming is commonly observed in cancer, and is hypothesized to involve multiple mechanisms, including DNA methylation and Polycomb repressive complexes (PRCs). Here we devise a new experimental and analytical strategy using customized high-density tiling arrays to investigate coordinated patterns of gene expression, DNA methylation, and Polycomb marks which differentiate prostate cancer cells from their normal counterparts. Three major changes in the epigenomic landscape distinguish the two cell types. Developmentally significant genes containing CpG islands which are silenced by PRCs in the normal cells acquire DNA methylation silencing and lose their PRC marks (epigenetic switching). Because these genes are normally silent this switch does not cause de novo repression but might significantly reduce epigenetic plasticity. Two other groups of genes are silenced by either de novo DNA methylation without PRC occupancy (5mC reprogramming) or by de novo PRC occupancy without DNA methylation (PRC reprogramming). Our data suggest that the two silencing mechanisms act in parallel to reprogram the cancer epigenome and that DNA hypermethylation may replace Polycomb-based repression near key regulatory genes, possibly reducing their regulatory plasticity.
-
(2008) GENOME BIOLOGY. 9, 2, R37. Abstract
Background: Insertions and deletions (indels) are an important evolutionary force, making the evolutionary process more efficient and flexible by copying and removing genomic fragments of various lengths instead of rediscovering them by point mutations. As a mutational process, indels are known to be more active in specific sequences (like micro-satellites) but not much is known about the more general and mechanistic effect of sequence context on the insertion and deletion susceptibility of genomic loci. Results: Here we analyze a large collection of high confidence short insertions and deletions in primates and flies, revealing extensive correlations between sequence context and indel rates and building principled models for predicting these rates from sequence. According to our results, the rate of insertion or deletion of specific lengths can vary by more than 100-fold, depending on the surrounding sequence. These mutational biases can strongly influence the composition of the genome and the rate at which particular sequences appear. We exemplify this by showing how degenerate loci in human exons are selected to reduce their frame shifting indel propensity. Conclusion: Insertions and deletions are strongly affected by sequence context. Consequentially, genomes must adapt to significant variation in the mutational input at indel-prone and indel-immune loci.
-
(2008) PLoS Computational Biology. 4, 1, p. 77-87 Abstract
In comparative genomics one analyzes jointly evolutionarily related species in order to identify conserved and diverged sequences and to infer their function. While such studies enabled the detection of conserved sequences in large genomes, the evolutionary dynamics of regulatory regions as a whole remain poorly understood. Here we present a probabilistic model for the evolution of promoter regions in yeast, combining the effects of regulatory interactions of many different transcription factors. The model expresses explicitly the selection forces acting on transcription factor binding sites in the context of a dynamic evolutionary process. We develop algorithms to compute likelihood and to learn de novo collections of transcription factor binding motifs and their selection parameters from alignments. Using the new techniques, we examine the evolutionary dynamics in Saccharomyces species promoters. Analyses of an evolutionary model constructed using all known transcription factor binding motifs and of a model learned from the data automatically reveal relatively weak selection on most binding sites. Moreover, according to our estimates, strong binding sites are constraining only a fraction of the yeast promoter sequence that is under selection. Our study demonstrates how complex evolutionary dynamics in noncoding regions emerges from formalization of the evolutionary consequences of known regulatory mechanisms.
1985
-
SEGREGATION OF PERIPHERAL-BLOOD LYMPHOCYTES IN SARCOIDOSIS ACCORDING TO THEIR AFFINITY TO INSOLUBILIZED HISTAMINE(1985) Israel Medical Association Journal. 21, 1, p. 6-11 Abstract