72.Highly conserved and cis-acting lncRNAs produced from paralogous regions in the center of HOXA and HOXB clusters in the endoderm lineage
Long noncoding RNAs (lncRNAs) have been shown to play important roles in gene regulatory networks acting in early development. There has been rapid turnover of lncRNA loci during vertebrate evolution, with few human lncRNAs conserved beyond mammals. The sequences of these rare deeply conserved lncRNAs are typically not similar to each other. Here, we characterize HOXA-AS3 and HOXB-AS3, lncRNAs produced from the central regions of the HOXA and HOXB clusters. Sequence-similar homologs of both lncRNAs are found in multiple vertebrate species and there is evident sequence similarity between their promoters, suggesting that the production of these lncRNAs predates the duplication of the HOX clusters at the root of the vertebrate lineage. This conservation extends to similar expression patterns of the two lncRNAs, in particular in cells transiently arising during early development or in the adult colon, and their co-regulation by the CDX1/2 transcription factors. Functionally, the RNA products of HOXA-AS3 and HOXB-AS3 regulate the expression of their overlapping HOX5-7 genes both in HT-29 cells and during differentiation of human embryonic stem cells. Beyond production of paralogous protein-coding and microRNA genes, the regulatory program in the HOX clusters therefore also relies on paralogous lncRNAs acting in restricted spatial and temporal windows of embryonic development and cell differentiation.
71.Sep 2020, EMBO Reports. In press, Abstract
Mammalian genomes encode thousands of long noncoding RNAs (lncRNAs), yet the biological functions of most of them remainunknown. A particularly rich repertoire of lncRNAs is found inmammalian brain and in the early embryo. We used RNA-seq andcomputational analysis to prioritize lncRNAs that may regulatecommitment of pluripotent cells to a neuronal fate and perturbedtheir expression prior to neuronal differentiation. Knockdown byRNAi of two highly conserved and well-expressed lncRNAs, Reno1 (2810410L24Rik) and lnc-Nr2f1, decreased the expression ofneuronal markers and led to massive changes in gene expressionin the differentiated cells. We further show that the Reno1 locusforms increasing spatial contacts during neurogenesis with itsadjacent protein-coding gene Bahcc1. Loss of either Reno1 or Bahcc1 leads to an early arrest in neuronal commitment, failure toinduce a neuronal gene expression program, and to global reduc-tion in chromatin accessibility at regions that are marked by the H3K4me3 chromatin mark at the onset of differentiation. Reno1 and Bahcc1 thus form a previously uncharacterized circuit requiredfor the early steps of neuronal commitment.
70.Jul 2020, Genome Research. Abstract
Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.
69.Gene architecture and sequence composition underpin selective dependency of long RNAs on components of the nuclear export pathway
The nuclear export pathway transports long RNAs produced in the nucleus to the cytoplasm. The core components of this pathway are thought to be required for export of virtually all polyadenylated RNAs. Here, we depleted different proteins that act in nuclear export in human cells, and quantified the transcriptome-wide consequences on RNA localization. Different genes exhibited substantially variable sensitivities, with depletion of NXF1 and TREX components causing some transcripts to become strongly retained in the nucleus while others were not affected. Specifically, NXF1 is preferentially required for export of single- or few-exon transcripts with long exons or high A/U-content, whereas depletion of TREX complex components preferentially affects spliced and G/C-rich transcripts. Using massively parallel reporter assays we identified short sequence elements that render transcripts dependent on NXF1 for their export, and identified synergistic effects of splicing and NXF1. These results revise the current model of how nuclear export shapes the distribution of RNA within human cells.
68.Transcription Dynamics Regulate Poly(A) Tails and Expression of the RNA Degradation Machinery to Balance mRNA Levels.
Gene expression is regulated by the rates of synthesis and degradation of mRNAs, but how these processes are coordinated is poorly understood. Here, we show that reduced transcription dynamics of specific genes leads to enhanced m6A deposition, preferential activity of the CCR4-Not complex, shortened poly(A) tails, and reduced stability of the respective mRNAs. These effects are also exerted by internal ribosome entry site (IRES) elements, which we found to be transcriptional pause sites. However, when transcription dynamics, and subsequently poly(A) tails, are globally altered, cells buffer mRNA levels by adjusting the expression of mRNA degradation machinery. Stress-provoked global impediment of transcription elongation leads to a dramatic inhibition of the mRNA degradation machinery and massive mRNA stabilization. Accordingly, globally enhanced transcription, such as following B cell activation or glucose stimulation, has the opposite effects. This study uncovers two molecular pathways that maintain balanced gene expression in mammalian cells by linking transcription to mRNA stability.
67.Mar 2020, The EMBO Journal. 39, 6, p. e103777 Abstract
Research on non-coding RNA (ncRNA) is a rapidly expanding field. Providing an official gene symbol and name to ncRNA genes brings order to otherwise potential chaos as it allows unambiguous communication about each gene. The HUGO Gene Nomenclature Committee (HGNC, www.genenames.org) is the only group with the authority to approve symbols for human genes. The HGNC works with specialist advisors for different classes of ncRNA to ensure that ncRNA nomenclature is accurate and informative, where possible. Here, we review each major class of ncRNA that is currently annotated in the human genome and describe how each class is assigned a standardised nomenclature.
66.Feb 2020, Nature Reviews Genetics. 21, 2, p. 102-117 Abstract
Long non-coding RNAs (lncRNAs) are diverse transcription products emanating from thousands of loci in mammalian genomes. Cis-acting lncRNAs, which constitute a substantial fraction of lncRNAs with an attributed function, regulate gene expression in a manner dependent on the location of their own sites of transcription, at varying distances from their targets in the linear genome. Through various mechanisms, cis-acting lncRNAs have been demonstrated to activate, repress or otherwise modulate the expression of target genes. We discuss the activities that have been ascribed to cis-acting lncRNAs, the evidence and hypotheses regarding their modes of action, and the methodological advances that enable their identification and characterization. The emerging principles highlight lncRNAs as transcriptional units highly adept at contributing to gene regulatory networks and to the generation of fine-tuned spatial and temporal gene expression programmes.
65.Jan 2020, Nature Communications. 11, p. 644 Abstract
Obesity and type 2 diabetes mellitus are global emergencies and long noncoding RNAs (lncRNAs) are regulatory transcripts with elusive functions in metabolism. Here we show that a high fraction of lncRNAs, but not protein-coding mRNAs, are repressed during diet-induced obesity (DIO) and refeeding, whilst nutrient deprivation induced lncRNAs in mouse liver. Similarly, lncRNAs are lost in diabetic humans. LncRNA promoter analyses, global cistrome and gain-of-function analyses confirm that increased MAFG signaling during DIO curbs lncRNA expression. Silencing Mafg in mouse hepatocytes and obese mice elicits a fasting-like gene expression profile, improves glucose metabolism, de-represses lncRNAs and impairs mammalian target of rapamycin (mTOR) activation. We find that obesity-repressed LincIRS2 is controlled by MAFG and observe that genetic and RNAi-mediated LincIRS2 loss causes elevated blood glucose, insulin resistance and aberrant glucose output in lean mice. Taken together, we identify a MAFG-lncRNA axis controlling hepatic glucose metabolism in health and metabolic disease.
64.Jan 2020, Cold Spring Harbor Symposia on Quantitative Biology. In press, Abstract
Long noncoding RNAs (lncRNAs) are gathering increasing attention toward their roles in different biological systems. In mammals, the richest repertoires of lncRNAs are expressed in the brain and in the testis, and the diversity of lncRNAs in the nervous system is thought to be related to the diversity and the complexity of its cell types. Supporting this notion, many lncRNAs are differentially expressed between different regions of the brain or in particular cell types, and many lncRNAs are dynamically expressed during embryonic or postnatal neurogenesis. Less is known about the functions of these genes, if any, but they are increasingly implicated in diverse processes in health and disease. Here, we review the current knowledge about the roles and importance of lncRNAs in the central and peripheral nervous systems and discuss the specific niches within gene regulatory networks that might be preferentially occupied by lncRNAs.
63.Dec 2019, Cell. 179, 7, p. 1609-1622 Abstract
Microglia, the brain-resident immune cells, are critically involved in many physiological and pathological brain processes, including neurodegeneration. Here we characterize microglia morphology and transcriptional programs across ten species spanning more than 450 million years of evolution. We find that microglia express a conserved core gene program of orthologous genes from rodents to humans, including ligands and receptors associated with interactions between glia and neurons. In most species, microglia show a single dominant transcriptional state, whereas human microglia display significant heterogeneity. In addition, we observed notable differences in several gene modules of rodents compared with primate microglia, including complement, phagocytic, and susceptibility genes to neurodegeneration, such as Alzheimer's and Parkinson's disease. Our study provides an essential resource of conserved and divergent microglia pathways across evolution, with important implications for future development of microglia-based therapies in humans.
62.Nov 2019, Nature Communications. 10, 1, p. 5317 Abstract
Regulatory RNAs exert their cellular functions through RNA-binding proteins (RBPs). Identifying RNA-protein interactions is therefore key for a molecular understanding of regulatory RNAs. To date, RNA-bound proteins have been identified primarily through RNA purification followed by mass spectrometry. Here, we develop incPRINT (in cell protein-RNA interaction), a high-throughput method to identify in-cell RNA-protein interactions revealed by quantifiable luminescence. Applying incPRINT to long noncoding RNAs (lncRNAs), we identify RBPs specifically interacting with the lncRNA Firre and three functionally distinct regions of the lncRNA Xist. incPRINT confirms previously known lncRNA-protein interactions and identifies additional interactions that had evaded detection with other approaches. Importantly, the majority of the incPRINT-defined interactions are specific to individual functional regions of the large Xist transcript. Thus, we present an RNA-centric method that enables reliable identification of RNA-region-specific RBPs and is applicable to any RNA of interest.
61.Nov 2019, Nature Communications. 10, 1, p. 5092 Abstract
Chromodomain helicase DNA binding protein 2 (Chd2) is a chromatin remodeller implicated in neurological disease. Here we show that Chaserr, a highly conserved long noncoding RNA transcribed from a region near the transcription start site of Chd2 and on the same strand, acts in concert with the CHD2 protein to maintain proper Chd2 expression levels. Loss of Chaserr in mice leads to early postnatal lethality in homozygous mice, and severe growth retardation in heterozygotes. Mechanistically, loss of Chaserr leads to substantially increased Chd2 mRNA and protein levels, which in turn lead to transcriptional interference by inhibiting promoters found downstream of highly expressed genes. We further show that Chaserr production represses Chd2 expression solely in cis, and that the phenotypic consequences of Chaserr loss are rescued when Chd2 is perturbed as well. Targeting Chaserr is thus a potential strategy for increasing CHD2 levels in haploinsufficient individuals.
60.Oct 2019, Cardiovascular Research. 115, 12, p. 1692-1704 Abstract
Present throughout the vasculature, endothelial cells (ECs) are essential for blood vessel function and play a central role in the pathogenesis of diverse cardiovascular diseases. Understanding the intricate molecular determinants governing endothelial function and dysfunction is essential to develop novel clinical breakthroughs and improve knowledge. An increasing body of evidence demonstrates that long non-coding RNAs (lncRNAs) are active regulators of the endothelial transcriptome and function, providing emerging insights into core questions surrounding EC contributions to pathology, and perhaps the emergence of novel therapeutic opportunities. In this review, we discuss this class of non-coding transcripts and their role in endothelial biology during cardiovascular development, homeostasis, and disease, highlighting challenges during discovery and characterization and how these have been overcome to date. We further discuss the translational therapeutic implications and the challenges within the field, highlighting lncRNA that support endothelial phenotypes prevalent in cardiovascular disease.
59.The Human- and Smooth Muscle Cell-Enriched lncRNA SMILR Promotes Proliferation by Regulating Mitotic CENPF mRNA and Drives Cell-Cycle Progression Which Can Be Targeted
Rationale: In response to blood vessel wall injury, aberrant proliferation of vascular smooth muscle cells (SMCs) causes pathological remodeling. However, the controlling mechanisms are not completely understood.
Objective: We recently showed that the human long noncoding RNA, SMILR, promotes vascular SMCs proliferation by a hitherto unknown mechanism. Here, we assess the therapeutic potential of SMILR inhibition and detail the molecular mechanism of action.
Methods and results: We used deep RNA-sequencing of human saphenous vein SMCs stimulated with IL (interleukin)-1α and PDGF (platelet-derived growth factor)-BB with SMILR knockdown (siRNA) or overexpression (lentivirus), to identify SMILR-regulated genes. This revealed a SMILR-dependent network essential for cell cycle progression. In particular, we found using the fluorescent ubiquitination-based cell cycle indicator viral system that SMILR regulates the late mitotic phase of the cell cycle and cytokinesis with SMILR knockdown resulting in ≈10% increase in binucleated cells. SMILR pulldowns further revealed its potential molecular mechanism, which involves an interaction with the mRNA of the late mitotic protein CENPF (centromere protein F) and the regulatory Staufen1 RNA-binding protein. SMILR and this downstream axis were also found to be activated in the human ex vivo vein graft pathological model and in primary human coronary artery SMCs and atherosclerotic plaques obtained at carotid endarterectomy. Finally, to assess the therapeutic potential of SMILR, we used a novel siRNA approach in the ex vivo vein graft model (within the 30 minutes clinical time frame that would occur between harvest and implant) to assess the reduction of proliferation by EdU incorporation. SMILR knockdown led to a marked decrease in proliferation from ≈29% in controls to ≈5% with SMILR depletion.
Conclusions: Collectively, we demonstrate that SMILR is a critical mediator of vascular SMC proliferation via direct regulation of mitotic progression. Our data further reveal a potential SMILR-targeting intervention to limit atherogenesis and adverse vascular remodeling.
58.Jun 2019, Journal of Molecular Biology. 431, 13, p. 2398-2406 Abstract
Genome-wide analysis of cellular transcriptomes using RNA-seq or expression arrays is a major mainstay of current biological and biomedical research. EXPANDER (EXPression ANalyzer and DisplayER) is a comprehensive software package for analysis of expression data, with built-in support for 18 different organisms. It is designed as a "one-stop shop" platform for transcriptomic analysis, allowing for execution of all analysis steps starting with gene expression data matrix. Analyses offered include low-level preprocessing and normalization, differential expression analysis, clustering, bi-clustering, supervised grouping, high-level functional and pathway enrichment tests, and networks and motif analyses. A variety of options is offered for each step, using established algorithms, including many developed and published by our laboratory. EXPANDER has been continuously developed since 2003, having to date over 18,000 downloads and 540 citations. One of the innovations in the recent version is support for combined analysis of gene expression and ChIP-seq data to enhance the inference of transcriptional networks and their functional interpretation. EXPANDER implements cutting-edge algorithms and makes them accessible to users through user-friendly interface and intuitive visualizations. It is freely available to users at http://acgt.cs.tau.ac.il/expander/.
56.Alternative 3' UTRs direct localization of functionally diverse protein isoforms in neuronal compartments
The proper subcellular localization of RNAs and local translational regulation is crucial in highly compartmentalized cells, such as neurons. RNA localization is mediated by specific cis-regulatory elements usually found in mRNA 3'UTRs. Therefore, processes that generate alternative 3'UTRs-alternative splicing and polyadenylation-have the potential to diversify mRNA localization patterns in neurons. Here, we performed mapping of alternative 3'UTRs in neurites and soma isolated from mESC-derived neurons. Our analysis identified 593 genes with differentially localized 3'UTR isoforms. In particular, we have shown that two isoforms of Cdc42 gene with distinct functions in neuronal polarity are differentially localized between neurites and soma of mESC-derived and mouse primary cortical neurons, at both mRNA and protein level. Using reporter assays and 3'UTR swapping experiments, we have identified the role of alternative 3'UTRs and mRNA transport in differential localization of alternative CDC42 protein isoforms. Moreover, we used SILAC to identify isoform-specific Cdc42 3'UTR-bound proteome with potential role in Cdc42 localization and translation. Our analysis points to usage of alternative 3'UTR isoforms as a novel mechanism to provide for differential localization of functionally diverse alternative protein isoforms.
55.Feb 2019, RNA. 25, 5, p. 557-572 Abstract
Export to the cytoplasm is a key regulatory junction for both protein-coding mRNAs and long noncoding RNAs (lncRNAs), and cytoplasmic enrichment varies dramatically both within and between those groups. We used a new computational approach and RNA-seq data from human and mouse cells to quantify the genome-wide association between cytoplasmic/nuclear ratios of both gene groups and various factors, including expression levels, splicing efficiency, gene architecture, chromatin marks, and sequence elements. Splicing efficiency emerged as the main predictive factor, explaining up to a third of the variability in localization. Combination with other features allowed predictive models that could explain up to 45% of the variance for protein-coding genes and up to 34% for lncRNAs. Factors associated with localization were similar between lncRNAs and mRNAs with some important differences. Readily accessible features can thus be used to predict RNA localization.
54.Dec 2018, Cancer Research. Abstract[All authors]
Downregulation of the urea cycle enzyme argininosuccinate synthase (ASS1) by either promoter methylation or by HIF1α is associated with increased metastasis and poor prognosis in multiple cancers. We have previously shown that in normoxic conditions, ASS1 downregulation facilitates cancer cell proliferation by increasing aspartate availability for pyrimidine synthesis by the enzyme complex CAD. Here we report that in hypoxia, ASS1 expression in cancerous cells is downregulated further by Hif1α-mediated induction of miR224-5p, making the cells more invasive and dependent on upstream substrates of ASS1 for survival. ASS1 was downregulated under acidic conditions, and ASS1-depleted cancer cells maintained a higher intracellular pH (pHi), depended less on extracellular glutamine, and displayed higher glutathione levels. Depletion of substrates of urea cycle enzymes in ASS1-deficient cancers decreased cancer cell survival. Thus, ASS1 levels in cancer are differentially regulated in various environmental conditions to metabolically benefit cancer progression. Understanding these alterations may help uncover specific context-dependent cancer vulnerabilities that may be targeted for therapeutic purposes.
53.Oct 2018, Molecular Cell. 72, 3, p. 553-567.E5 Abstract
In mammals, neurons in the peripheral nervous system (PNS) have regenerative capacity following injury, but it is generally absent in the CNS. This difference is attributed, at least in part, to the intrinsic ability of PNS neurons to activate a unique regenerative transcriptional program following injury. Here, we profiled gene expression following sciatic nerve crush in mice and identified long noncoding RNAs (lncRNAs) that act in the regenerating neurons and which are typically not expressed in other contexts. We show that two of these lncRNAs regulate the extent of neuronal outgrowth. We then focus on one of these, Silc1, and show that it regulates neuroregeneration in cultured cells and in vivo, through cis-acting activation of the transcription factor Sox11.
52.Sep 2018, Genome Biology. 19, 1, p. 19(1):152 Abstract
MicroRNAs (miRNAs) are short regulatory RNAs that derive from hairpin precursors. Important for understanding the functional roles of miRNAs is the ability to predict the messenger RNA (mRNA) targets most responsive to each miRNA. Progress towards developing quantitative models of miRNA targeting in Drosophila and other invertebrate species has lagged behind that of mammals due to the paucity of datasets measuring the effects of miRNAs on mRNA levels.
We acquired datasets suitable for the quantitative study of miRNA targeting in Drosophila. Analyses of these data expanded the types of regulatory sites known to be effective in flies, expanded the mRNA regions with detectable targeting to include 5' untranslated regions, and identified features of site context that correlate with targeting efficacy in fly cells. Updated evolutionary analyses evaluated the probability of conserved targeting for each predicted site and indicated that more than a third of the Drosophila genes are preferentially conserved targets of miRNAs. Based on these results, a quantitative model was developed to predict targeting efficacy in insects. This model performed better than existing models, and it drives the most recent version, v7, of TargetScanFly.
Our evolutionary and functional analyses expand the known scope of miRNA targeting in flies and other insects. The existence of a quantitative model that has been developed and trained using Drosophila data will provide a valuable resource for placing miRNAs into gene regulatory networks of this important experimental organism.
51.Jun 2018, Cell Systems. 7, 5, p. 537-547 Abstract
Active enhancers in mammals produce enhancer RNAs (eRNAs), that are bidirectionally transcribed, unspliced, and unstable noncoding RNAs. Enhancer regions are also enriched with long noncoding RNA (lncRNA) genes, which are typically spliced and are longer and substantially more stable than eRNAs. In order to explore the relationship between these two classes of RNAs and the implications of lncRNA transcription on enhancer functionality, we analyzed DNAse hypersensitive sites with evidence of bidirectional transcription, which we termed eRNA producing centers (EPCs). A subset of EPCs, which are found very close to the transcription start site of lncRNA genes, exhibit attributes of both enhancers and promoters, including distinctive DNA motifs and a characteristic landscape of bound proteins. These EPCs are associated with a subset of relatively highly active enhancers. This stronger enhancer activity is driven, at least in part, by the presence of evolutionary conserved, directional splicing signals that promote lncRNA production, pointing at a causal role of lncRNA processing in enhancer activity. Together, our results suggest a model whereby the ability of some enhancers to produce lncRNAs, which is conserved in evolution, enhances their activity in a manner likely mediated through maturation of the associated lncRNA.
50.Altered p53 functionality in cancer-associated fibroblasts contributes to their cancer-supporting features.[All authors]
Within the tumor microenvironment, cancer cells coexist with noncancerous adjacent cells that constitute the tumor microenvironment and impact tumor growth through diverse mechanisms. In particular, cancer-associated fibroblasts (CAFs) promote tumor progression in multiple ways. Earlier studies have revealed that in normal fibroblasts (NFs), p53 plays a cell nonautonomous tumor-suppressive role to restrict tumor growth. We now wished to investigate the role of p53 in CAFs. Remarkably, we found that the transcriptional program supported by p53 is altered substantially in CAFs relative to NFs. In agreement, the p53-dependent secretome is also altered in CAFs. This transcriptional rewiring renders p53 a significant contributor to the distinct intrinsic features of CAFs, as well as promotes tumor cell migration and invasion in culture. Concordantly, the ability of CAFs to promote tumor growth in mice is greatly compromised by depletion of their endogenous p53. Furthermore, cocultivation of NFs with cancer cells renders their p53-dependent transcriptome partially more similar to that of CAFs. Our findings raise the intriguing possibility that tumor progression may entail a nonmutational conversion ("education") of stromal p53, from tumor suppressive to tumor supportive.
49.May 2018, FEBS Letters. In Press, Abstract
It is now evident that noncoding RNAs play key roles in regulatory networks determining cell fate and behavior, in a myriad of different conditions, and across all species. Among these noncoding RNAs are short RNAs, such as microRNAs, snoRNAs, and piRNAs, and the functions of those are relatively well understood. Other noncoding RNAs are longer, and their modes of action and functions are also increasingly explored and deciphered. Short RNAs and long noncoding RNAs (lncRNAs) interact with each other with reciprocal consequences for their fates and functions. LncRNAs serve as precursors for many types of small RNAs and, therefore, the pathways for small RNA biogenesis can impinge upon the fate of lncRNAs. In addition, lncRNA expression can be repressed by small RNAs, and lncRNAs can affect small RNA activity and abundance through competition for binding or by triggering small RNA degradation. Here, I review the known types of interactions between small and long RNAs, discuss their outcomes, and bring representative examples from studies in mammals.
48.May 2018, Nature Immunology. 19, 6, p. 636-644 Abstract[All authors]
Transcriptome profiling is widely used to infer functional states of specific cell types, as well as their responses to stimuli, to define contributions to physiology and pathophysiology. Focusing on microglia, the brain’s macrophages, we report here a side-by-side comparison of classical cell-sorting-based transcriptome sequencing and the ‘RiboTag’ method, which avoids cell retrieval from tissue context and yields translatome sequencing information. Conventional whole-cell microglial transcriptomes were found to be significantly tainted by artifacts introduced by tissue dissociation, cargo contamination and transcripts sequestered from ribosomes. Conversely, our data highlight the added value of RiboTag profiling for assessing the lineage accuracy of Cre recombinase expression in transgenic mice. Collectively, this study indicates method-based biases, reveals observer effects and establishes RiboTag-based translatome profiling as a valuable complement to standard sorting-based profiling strategies.
47.Mar 2018, Nature. 555, 7694, p. 107-111 Abstract
Long noncoding RNAs (lncRNAs) are emerging as key parts of multiple cellular pathways, but their modes of action and how these are dictated by sequence remain unclear. lncRNAs tend to be enriched in the nuclear fraction, whereas most mRNAs are overtly cytoplasmic, although several studies have found that hundreds of mRNAs in various cell types are retained in the nucleus. It is thus conceivable that some mechanisms that promote nuclear enrichment are shared between lncRNAs and mRNAs. Here, to identify elements in lncRNAs and mRNAs that can force nuclear localization, we screened libraries of short fragments tiled across nuclear RNAs, which were cloned into the untranslated regions of an efficiently exported mRNA. The screen identified a short sequence derived from Alu elements and bound by HNRNPK that increased nuclear accumulation. Binding of HNRNPK to C-rich motifs outside Alu elements is also associated with nuclear enrichment in both lncRNAs and mRNAs, and this mechanism is conserved across species. Our results thus identify a pathway for regulation of RNA accumulation and subcellular localization that has been co-opted to regulate the fate of transcripts with integrated Alu elements.
46.Jan 2018, Genes & Development. 32, 1, p. 70-78 Abstract
The number of known long noncoding RNA (lncRNA) functions is rapidly growing, but how those functions are encoded in their sequence and structure remains poorly understood. NORAD (noncoding RNA activated by DNA damage) is a recently characterized, abundant, and highly conserved lncRNA that is required for proper mitotic divisions in human cells. NORAD acts in the cytoplasm and antagonizes repressors from the Pumilio family that bind at least 17 sites spread through 12 repetitive units in NORAD sequence. Here we study conserved sequences in NORAD repeats, identify additional interacting partners, and characterize the interaction between NORAD and the RNA-binding protein SAM68 (KHDRBS1), which is required for NORAD function in antagonizing Pumilio. These interactions provide a paradigm for how repeated elements in a lncRNA facilitate function.
45.Cap-proximal nucleotides via differential eIF4E binding and alternative promoter usage mediate translational response to energy stress
Transcription start-site (TSS) selection and alternative promoter (AP) usage contribute to gene expression complexity but little is known about their impact on translation. Here we performed TSS mapping of the translatome following energy stress. Assessing the contribution of cap-proximal TSS nucleotides, we found dramatic effect on translation only upon stress. As eIF4E levels were reduced, we determined its binding to capped-RNAs with different initiating nucleotides and found the lowest affinity to 5'cytidine in correlation with the translational stress-response. In addition, the number of differentially translated APs was elevated following stress. These include novel glucose starvation-induced downstream transcripts for the translation regulators eIF4A and Pabp, which are also translationally-induced despite general translational inhibition. The resultant eIF4A protein is N-terminally truncated and acts as eIF4A inhibitor. The induced Pabp isoform has shorter 5'UTR removing an auto-inhibitory element. Our findings uncovered several levels of coordination of transcription and translation responses to energy stress.
44.Efficient and Accurate Translation Initiation Directed by TISU Involves RPS3 and RPS10e Binding and Differential Eukaryotic Initiation Factor 1A Regulation
Canonical translation initiation involves ribosomal scanning, but short 5 ' untranslated region (5 ' UTR) mRNAs are translated in a scanning-independent manner. The extent and mechanism of scanning-independent translation are not fully understood. Here we report that short 5 ' UTR mRNAs constitute a substantial fraction of the translatome. Short 5 ' UTR mRNAs are enriched with TISU (translation initiator of short 5 ' UTR), a 12-nucleotide element directing efficient scanning-independent translation. Comprehensive mutagenesis revealed that each AUG codon-flanking nucleotide of TISU contributes to translational strength, but only a few are important for accuracy. Using site-specific UV cross-linking of ribosomal complexes assembled on TISU mRNA, we demonstrate specific binding of TISU to ribosomal proteins at the E and A sites. We identified RPS3 as the major TISU binding protein in the 48S complex A site. Upon 80S complex formation, RPS3 interaction is weakened and switched to RPS10e (formerly called RPS10). We further demonstrate that TISU is particularly dependent on eukaryotic initiation factor 1A (eIF1A) which interacts with both RPS3 and RPS10e. Our findings suggest that the cap-recruited ribosome specifically binds the TISU nucleotides at the A and E sites in cooperation with eIF1A to promote scanning arrest.
43.Genome-wide identification and expression profiling of long non-coding RNAs in auditory and vestibular systems
Mammalian genomes encode multiple layers of regulation, including a class of RNA molecules known as long non-coding RNAs (lncRNAs). These are > 200 nucleotides in length and similar to mRNAs, they are capped, polyadenylated, and spliced. In contrast to mRNAs, lncRNAs are less abundant and have higher tissue specificity, and have been linked to development, epigenetic processes, and disease. However, little is known about lncRNA function in the auditory and vestibular systems, or how they play a role in deafness and vestibular dysfunction. To help address this need, we performed a whole-genome identification of lncRNAs using RNA-seq at two developmental stages of the mouse inner ear sensory epithelium of the cochlea and vestibule. We identified 3,239 lncRNA genes, most of which were intergenic (lincRNAs) and 721 are novel. We examined temporal and tissue specificity by analyzing the developmental profiles on embryonic day 16.5 and at birth. The spatial and temporal patterns of three lncRNAs, two of which are in proximity to genes associated with hearing and deafness, were explored further. Our findings indicate that lncRNAs are prevalent in the sensory epithelium of the mouse inner ear and are likely to play key roles in regulating critical pathways for hearing and balance.
42.Dec 2017, Genome Biology. 18, p. 162 Abstract
Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs.
We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality.
We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
41.Dec 2017, Genome Biology. 18, 1, p. 202 Abstract
It is now obvious that the majority of cellular transcripts do not code for proteins, and a significant subset of them are long non-coding RNAs (lncRNAs). Many lncRNAs show aberrant expression in cancer, and some of them have been linked to cell transformation. However, the underlying mechanisms remain poorly understood and it is unknown how the sequences of lncRNA dictate their function.
Here we characterize the function of the p53-regulated human lncRNA LINC-PINT in cancer. We find that LINC-PINT is downregulated in multiple types of cancer and acts as a tumor suppressor lncRNA by reducing the invasive phenotype of cancer cells. A cross-species analysis identifies a highly conserved sequence element in LINC-PINT that is essential for its function. This sequence mediates a specific interaction with PRC2, necessary for the LINC-PINT-dependent repression of a pro-invasion signature of genes regulated by the transcription factor EGR1.
Our findings support a conserved functional co-dependence between LINC-PINT and PRC2 and lead us to propose a new mechanism where the lncRNA regulates the availability of free PRC2 at the proximity of co-regulated genomic loci.
40.Dec 2017, Neurology. 89, 16, p. 1676-1683 Abstract[All authors]
To examine whether gene expression analysis of a large-scale Parkinson disease (PD) patient cohort produces a robust blood-based PD gene signature compared to previous studies that have used relatively small cohorts (≤220 samples).
Whole-blood gene expression profiles were collected from a total of 523 individuals. After preprocessing, the data contained 486 gene profiles (n = 205 PD, n = 233 controls, n = 48 other neurodegenerative diseases) that were partitioned into training, validation, and independent test cohorts to identify and validate a gene signature. Batch-effect reduction and cross-validation were performed to ensure signature reliability. Finally, functional and pathway enrichment analyses were applied to the signature to identify PD-associated gene networks.
A gene signature of 100 probes that mapped to 87 genes, corresponding to 64 upregulated and 23 downregulated genes differentiating between patients with idiopathic PD and controls, was identified with the training cohort and successfully replicated in both an independent validation cohort (area under the curve [AUC] = 0.79, p = 7.13E-6) and a subsequent independent test cohort (AUC = 0.74, p = 4.2E-4). Network analysis of the signature revealed gene enrichment in pathways, including metabolism, oxidation, and ubiquitination/proteasomal activity, and misregulation of mitochondria-localized genes, including downregulation of COX4I1, ATP5A1, and VDAC3.
We present a large-scale study of PD gene expression profiling. This work identifies a reliable blood-based PD signature and highlights the importance of large-scale patient cohorts in developing potential PD biomarkers.
38.Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs
Long noncoding RNAs (lncRNAs) are a diverse class of RNAs with increasingly appreciated functions in vertebrates, yet much of their biology remains poorly understood. In particular, it is unclear to what extent the current catalog of over 10,000 annotated lncRNAs is indeed devoid of genes coding for proteins. Here we review the available computational and experimental schemes for distinguishing between coding and noncoding transcripts and assess the conclusions from their recent genome-wide applications. We conclude that the model most consistent with the available data is that a large number of mammalian lncRNAs undergo translation, but only a very small minority of such translation events results in stable and functional peptides. The outcomes of the majority of the translation events and their potential biological purposes remain an intriguing topic for future investigation. This article is part of a Special Issue entitled: Clues to long noncoding RNA taxonomy1, edited by Dr. Tetsuro Hirose and Dr. Shinichi Nakagawa. (C) 2015 Elsevier B.V. All rights reserved.
37.A conserved abundant cytoplasmic long noncoding RNA modulates repression by Pumilio proteins in human cells.
Thousands of long noncoding RNA (lncRNA) genes are encoded in the human genome, and hundreds of them are evolutionarily conserved, but their functions and modes of action remain largely obscure. Particularly enigmatic lncRNAs are those that are exported to the cytoplasm, including NORAD-an abundant and highly conserved cytoplasmic lncRNA. Here we show that most of the sequence of NORAD is comprised of repetitive units that together contain at least 17 functional binding sites for the two mammalian Pumilio homologues. Through binding to PUM1 and PUM2, NORAD modulates the mRNA levels of their targets, which are enriched for genes involved in chromosome segregation during cell division. Our results suggest that some cytoplasmic lncRNAs function by modulating the activities of RNA-binding proteins, an activity which positions them at key junctions of cellular signalling pathways.
36.LIMT is a novel metastasis inhibiting lncRNA suppressed by EGF and downregulated in aggressive breast cancer[All authors]
2016 The Authors. Published under the terms of the CC BY 4.0 license.Long noncoding RNAs (lncRNAs) are emerging as regulators of gene expression in pathogenesis, including cancer. Recently, lncRNAs have been implicated in progression of specific subtypes of breast cancer. One aggressive, basal-like subtype associates with increased EGFR signaling, while another, the HER2-enriched subtype, engages a kin of EGFR Based on the premise that EGFR-regulated lncRNAs might control the aggressiveness of basal-like tumors, we identified multiple EGFR-inducible lncRNAs in basal-like normal cells and overlaid them with the transcriptomes of over 3,000 breast cancer patients. This led to the identification of 11 prognostic lncRNAs. Functional analyses of this group uncovered LINC01089 (here renamed LncRNA Inhibiting Metastasis; LIMT), a highly conserved lncRNA, which is depleted in basal-like and in HER2-positive tumors, and the low expression of which predicts poor patient prognosis. Interestingly, EGF rapidly downregulates LIMT expression by enhancing histone deacetylation at the respective promoter. We also find that LIMT inhibits extracellular matrix invasion of mammary cells invitro and tumor metastasis invivo In conclusion, lncRNAs dynamically regulated by growth factors might act as novel drivers of cancer progression and serve as prognostic biomarkers.
35.Dec 2016, Nature Reviews Genetics. 17, p. 601-614 Abstract
Long non-coding RNAs (lncRNAs) have emerged in recent years as major players in a multitude of pathways across species, but it remains challenging to understand which of them are important and how their functions are performed. Comparative sequence analysis has been instrumental for studying proteins and small RNAs, but the rapid evolution of lncRNAs poses new challenges that demand new approaches. Here, I review the lessons learned so far from genome-wide mapping and comparisons of lncRNAs across different species. I also discuss how comparative analyses can help us to understand lncRNA function and provide practical considerations for examining functional conservation of lncRNA genes.
34.Dec 2015, Nature. 527, p. 379-+ Abstract[All authors]
Cancer cells hijack and remodel existing metabolic pathways for their benefit. Argininosuccinate synthase (ASS1) is a urea cycle enzyme that is essential in the conversion of nitrogen from ammonia and aspartate to urea. A decrease in nitrogen flux through ASS1 in the liver causes the urea cycle disorder citrullinaemia(1). In contrast to the well-studied consequences of loss of ASS1 activity on ureagenesis, the purpose of its somatic silencing in multiple cancers is largely unknown(2). Here we show that decreased activity of ASS1 in cancers supports proliferation by facilitating pyrimidine synthesis via CAD (carbamoyl-phosphate synthase 2, aspartate transcarbamylase, and dihydroorotase complex) activation. Our studies were initiated by delineating the consequences of loss of ASS1 activity in humans with two types of citrullinaemia. We find that in citrullinaemia type I (CTLN I), which is caused by deficiency of ASS1, there is increased pyrimidine synthesis and proliferation compared with citrullinaemia type II (CTLN II), in which there is decreased substrate availability for ASS1 caused by deficiency of the aspartate transporter citrin. Building on these results, we demonstrate that ASS1 deficiency in cancer increases cytosolic aspartate levels, which increases CAD activation by upregulating its substrate availability and by increasing its phosphorylation by S6K1 through the mammalian target of rapamycin (mTOR) pathway. Decreasing CAD activity by blocking citrin, the mTOR signalling, or pyrimidine synthesis decreases proliferation and thus may serve as a therapeutic strategy in multiple cancers where ASS1 is downregulated. Our results demonstrate that ASS1 downregulation is a novel mechanism supporting cancerous proliferation, and they provide a metabolic link between the urea cycle enzymes and pyrimidine synthesis.
33.Dec 2015, Cell Reports. 13, p. 2653-2662 Abstract
mRNA is thought to predominantly reside in the cytoplasm, where it is translated and eventually degraded. Although nuclear retention of mRNA has a regulatory potential, it is considered extremely rare in mammals. Here, to explore the extent of mRNA retention in metabolic tissues, we combine deep sequencing of nuclear and cytoplasmic RNA fractions with single-molecule transcript imaging in mouse beta cells, liver, and gut. We identify a wide range of protein-coding genes for which the levels of spliced polyadenylated mRNA are higher in the nucleus than in the cytoplasm. These include genes such as the transcription factor ChREBP, Nlrp6, Glucokinase, and Glucagon receptor. We demonstrate that nuclear retention of mRNA can efficiently buffer cytoplasmic transcript levels from noise that emanates from transcriptional bursts. Our study challenges the view that transcripts predominantly reside in the cytoplasm and reveals a role of the nucleus in dampening gene expression noise.
32.Circular RNAs are long-lived and display only minimal early alterations in response to a growth factor
Circular RNAs (circRNAs) are widespread circles of non-coding RNAs with largely unknown function. Because stimulation of mammary cells with the epidermal growth factor (EGF) leads to dynamic changes in the abundance of coding and non-coding RNA molecules, and culminates in the acquisition of a robust migratory phenotype, this cellular model might disclose functions of circRNAs. Here we show that circRNAs of EGF-stimulated mammary cells are stably expressed, while mRNAs and microRNAs change within minutes. In general, the circRNAs we detected are relatively long-lived and weakly expressed. Interestingly, they are almost ubiquitously co-expressed with the corresponding linear transcripts, and the respective, shared promoter regions are more active compared to genes producing linear isoforms with no detectable circRNAs. These findings imply that altered abundance of circRNAs, unlike changes in the levels of other RNAs, might not play critical roles in signaling cascades and downstream transcriptional networks that rapidly commit cells to specific outcomes.
31.Principles of Long Noncoding RNA Evolution Derived from Direct Comparison of Transcriptomes in 17 Species
The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNAseq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, > 70% of lincRNAs cannot be traced to homologs in species that diverged > 50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 50-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture.
30.Dec 2015, Nature Genetics. 47, p. 1408-+ Abstract[All authors]
Analysis of 501 melanoma exomes identified RASA2, encoding a RasGAP, as a tumor-suppressor gene mutated in 5% of melanomas. Recurrent loss-of-function mutations in RASA2 were found to increase RAS activation, melanoma cell growth and migration. RASA2 expression was lost in >= 30% of human melanomas and was associated with reduced patient survival. These findings identify RASA2 inactivation as a melanoma driver and highlight the importance of RasGAPs in cancer.
29.Dec 2013, Cell. 152, p. 844-858 Abstract
To use microRNAs to downregulate mRNA targets, cells must first process these similar to 22 nt RNAs from primary transcripts (pri-miRNAs). These transcripts form RNA hairpins important for processing, but additional determinants must distinguish pri-miRNAs from the many other hairpin-containing transcripts expressed in each cell. Illustrating the complexity of this recognition, we show that most Caenorhabditis elegans pri-miRNAs lack determinants required for processing in human cells. To find these determinants, we generated many variants of four human pri-miRNAs, sequenced millions that retained function, and compared them with the starting variants. Our results confirmed the importance of pairing in the stem and revealed three primary-sequence determinants, including an SRp20-binding motif (CNNC) found downstream of most pri-miRNA hairpins in bilaterian animals, but not in nematodes. Adding this and other determinants to C. elegans pri-miRNAs imparted efficient processing in human cells, thereby confirming the importance of primary-sequence determinants for distinguishing pri-miRNAs from other hairpin-containing transcripts.
28.Dec 2013, Cell. 154, p. 26-46 Abstract
Long intervening noncoding RNAs (lincRNAs) are transcribed from thousands of loci in mammalian genomes and might play widespread roles in gene regulation and other cellular processes. This Review outlines the emerging understanding of lincRNAs in vertebrate animals, with emphases on how they are being identified and current conclusions and questions regarding their genomics, evolution and mechanisms of action.
27.Dec 2012, Genome Research. 22, p. 2054-2066 Abstract
The post-transcriptional fate of messenger RNAs (mRNAs) is largely dictated by their 3' untranslated regions (3' UTRs), which are defined by cleavage and polyadenylation (CPA) of pre-mRNAs. We used poly(A)-position profiling by sequencing (3P-seq) to map poly(A) sites at eight developmental stages and tissues in the zebrafish. Analysis of over 60 million 3P-seq reads substantially increased and improved existing 3' UTR annotations, resulting in confidently identified 3' UTRs for >79% of the annotated protein-coding genes in zebrafish. mRNAs from most zebrafish genes undergo alternative CPA, with those from more than a thousand genes using different dominant 3' UTRs at different stages. These included one of the poly(A) polymerase genes, for which alternative CPA reinforces its repression in the ovary. 3' UTRs tend to be shortest in the ovaries and longest in the brain. Isoforms with some of the shortest 3' UTRs are highly expressed in the ovary, yet absent in the maternally contributed RNAs of the embryo, perhaps because their 3' UTRs are too short to accommodate a uridine-rich motif required for stability of the maternal mRNA. At 2 h post-fertilization, thousands of unique poly(A) sites appear at locations lacking a typical polyadenylation signal, which suggests a wave of widespread cytoplasmic polyadenylation of mRNA degradation intermediates. Our insights into the identities, formation, and evolution of zebrafish 3' UTRs provide a resource for studying gene regulation during vertebrate development.
26.Dec 2011, Cell. 147, p. 1537-1550 Abstract
Thousands of long intervening noncoding RNAs (lincRNAs) have been identified in mammals. To better understand the evolution and functions of these enigmatic RNAs, we used chromatin marks, poly(A)-site mapping and RNA-Seq data to identify more than 550 distinct lincRNAs in zebrafish. Although these shared many characteristics with mammalian lincRNAs, only 29 had detectable sequence similarity with putative mammalian orthologs, typically restricted to a single short region of high conservation. Other lincRNAs had conserved genomic locations without detectable sequence conservation. Antisense reagents targeting conserved regions of two zebrafish lincRNAs caused developmental defects. Reagents targeting splice sites caused the same defects and were rescued by adding either the mature lincRNA or its human or mouse ortholog. Our study provides a roadmap for identification and analysis of lincRNAs in model organisms and shows that lincRNAs play crucial biological roles during embryonic development with functionality conserved despite limited sequence conservation.
25.A Point Mutation in Translation Initiation Factor eIF2B Leads to Function- and Time-Specific Changes in Brain Gene Expression
Background: Mutations in eukaryotic translation initiation factor 2B (eIF2B) cause Childhood Ataxia with CNS Hypomyelination (CACH), also known as Vanishing White Matter disease (VWM), which is associated with a clinical pathology of brain myelin loss upon physiological stress. eIF2B is the guanine nucleotide exchange factor (GEF) of eIF2, which delivers the initiator tRNA Met to the ribosome. We recently reported that a R132H mutation in the catalytic subunit of this GEF, causing a 20% reduction in its activity, leads under normal conditions to delayed brain development in a mouse model for CACH/VWM. To further explore the effect of the mutation on global gene expression in the brain, we conducted a wide-scale transcriptome analysis of the first three critical postnatal weeks. Methodology/Principal Findings: Genome-wide mRNA expression of wild-type and mutant mice was profiled at postnatal (P) days 1, 18 and 21 to reflect the early proliferative stage prior to white matter establishment (P1) and the peak of oligodendrocye differentiation and myelin synthesis (P18 and P21). At each developmental stage, between 441 and 818 genes were differentially expressed in the mutant brain with minimal overlap, generating unique time point-specific gene expression signatures. Conclusions: The current study demonstrates that a point mutation in eIF2B, a key translation initiation factor, has a massive effect on global gene expression in the brain. The overall changes in expression patterns reflect multiple layers of indirect effects that accumulate as the brain develops and matures. The differentially expressed genes seem to reflect delayed waves of gene expression as well as an adaptation process to cope with hypersensitivity to cellular stress.
24.Integration of Transcriptomics, Proteomics, and MicroRNA Analyses Reveals Novel MicroRNA Regulation of Targets in the Mammalian Inner Ear[All authors]
We have employed a novel approach for the identification of functionally important microRNA (miRNA)-target interactions, integrating miRNA, transcriptome and proteome profiles and advanced in silico analysis using the FAME algorithm. Since miRNAs play a crucial role in the inner ear, demonstrated by the discovery of mutations in a miRNA leading to human and mouse deafness, we applied this approach to microdissected auditory and vestibular sensory epithelia. We detected the expression of 157 miRNAs in the inner ear sensory epithelia, with 53 miRNAs differentially expressed between the cochlea and vestibule. Functionally important miRNAs were determined by searching for enriched or depleted targets in the transcript and protein datasets with an expression consistent with the dogma of miRNA regulation. Importantly, quite a few of the targets were detected only in the protein datasets, attributable to regulation by translational suppression. We identified and experimentally validated the regulation of PSIP1-P75, a transcriptional co-activator previously unknown in the inner ear, by miR-135b, in vestibular hair cells. Our findings suggest that miR-135b serves as a cellular effector, involved in regulating some of the differences between the cochlear and vestibular hair cells.
23.Dynamic Changes in the Copy Number of Pluripotency and Cell Proliferation Genes in Human ESCs and iPSCs during Reprogramming and Time in Culture[All authors]
Genomic stability is critical for the clinical use of human embryonic and induced pluripotent stem cells. We performed high-resolution SNP (single-nucleotide polymorphism) analysis on 186 pluripotent and 119 nonpluripotent samples. We report a higher frequency of subchromosomal copy number variations in pluripotent samples compared to nonpluripotent samples, with variations enriched in specific genomic regions. The distribution of these variations differed between hESCs and hiPSCs, characterized by large numbers of duplications found in a few hESC samples and moderate numbers of deletions distributed across many hiPSC samples. For hiPSCs, the reprogramming process was associated with deletions of tumor-suppressor genes, whereas time in culture was associated with duplications of oncogenic genes. We also observed duplications that arose during a differentiation protocol. Our results illustrate the dynamic nature of genomic abnormalities in pluripotent stem cells and the need for frequent genomic monitoring to assure phenotypic stability and clinical safety.
22.Dec 2011, Nucleic Acids Research. 39, p. D793-D799 Abstract[All authors]
The rapid accumulation of knowledge on biological signaling pathways and their regulatory mechanisms has highlighted the need for specific repositories that can store, organize and allow retrieval of pathway information in a way that will be useful for the research community. SPIKE (Signaling Pathways Integrated Knowledge Engine; http://www.cs.tau.ac.il/similar to spike/is a database for achieving this goal, containing highly curated interactions for particular human pathways, along with literature-referenced information on the nature of each interaction. To make database population and pathway comprehension straightforward, a simple yet informative data model is used, and pathways are laid out as maps that reflect the curator's understanding and make the utilization of the pathways easy. The database currently focuses primarily on pathways describing DNA damage response, cell cycle, programmed cell death and hearing related pathways. Pathways are regularly updated, and additional pathways are gradually added. The complete database and the individual maps are freely exportable in several formats. The database is accompanied by a stand-alone software tool for analysis and dynamic visualization of pathways.
21.Dec 2010, PLoS One. 5, Abstract
Background: Molecular studies of the human disease transcriptome typically involve a search for genes whose expression is significantly dysregulated in sick individuals compared to healthy controls. Recent studies have found that only a small number of the genes in human disease-related pathways show consistent dysregulation in sick individuals. However, those studies found that some pathway genes are affected in most sick individuals, but genes can differ among individuals. While a pathway is usually defined as a set of genes known to share a specific function, pathway boundaries are frequently difficult to assign, and methods that rely on such definition cannot discover novel pathways. Protein interaction networks can potentially be used to overcome these problems. Methodology/Principal Findings: We present DEGAS (DysrEgulated Gene set Analysis via Subnetworks), a method for identifying connected gene subnetworks significantly enriched for genes that are dysregulated in specimens of a disease. We applied DEGAS to seven human diseases and obtained statistically significant results that appear to home in on compact pathways enriched with hallmarks of the diseases. In Parkinson's disease, we provide novel evidence for involvement of mRNA splicing, cell proliferation, and the 14-3-3 complex in the disease progression. DEGAS is available as part of the MATISSE software package (http://acgt.cs.tau.ac.il/matisse). Conclusions/Significance: The subnetworks identified by DEGAS can provide a signature of the disease potentially useful for diagnosis, pinpoint possible pathways affected by the disease, and suggest targets for drug intervention.
20.Dec 2010, Nucleic Acids Research. 38, Abstract
While it has been established that microRNAs (miRNAs) play key roles throughout development and are dysregulated in many human pathologies, the specific processes and pathways regulated by individual miRNAs are mostly unknown. Here, we use computational target predictions in order to automatically infer the processes affected by human miRNAs. Our approach improves upon standard statistical tools by addressing specific characteristics of miRNA regulation. Our analysis is based on a novel compendium of experimentally verified miRNA-pathway and miRNA-process associations that we constructed, which can be a useful resource by itself. Our method also predicts novel miRNA-regulated pathways, refines the annotation of miRNAs for which only crude functions are known, and assigns differential functions to miRNAs with closely related sequences. Applying our approach to groups of co-expressed genes allows us to identify miRNAs and genomic miRNA clusters with functional importance in specific stages of early human development. A full list of the predicted mRNA functions is available at http://acgt.cs.tau.ac.il/fame/.
19.A plasma-membrane E-MAP reveals links of the eisosome with sphingolipid metabolism and endosomal trafficking[All authors]
The plasma membrane delimits the cell and controls material and information exchange between itself and the environment. How different plasma-membrane processes are coordinated and how the relative abundance of plasma-membrane lipids and proteins is homeostatically maintained are not yet understood. Here, we used a quantitative genetic interaction map, or E-MAP, to functionally interrogate a set of similar to 400 genes involved in various aspects of plasma-membrane biology, including endocytosis, signaling, lipid metabolism and eisosome function. From this E-MAP, we derived a set of 57,799 individual interactions between genes functioning in these various processes. Using triplet genetic motif analysis, we identified a new component of the eisosome, Eis1, and linked the poorly characterized gene EMP70 to endocytic and eisosome function. Finally, we implicated Rom2, a GDP/GTP exchange factor for Rho1 and Rho2, in the regulation of sphingolipid metabolism.
18.Dec 2010, Biochemical and Biophysical Research Communications. 393, p. 211-216 Abstract
We have developed and validated a microporous poly(ethylene terephthalate) membrane-based indirect co-culture system for human pluripotent stem cell (hPSC) propagation, which allows real-time conditioning of the culture medium with human fibroblasts while maintaining the complete separation of the two cell types. The propagation and pluripotent characteristics of a human embryonic stem cell (hESC) line and a human induced pluripotent stem cell (hiPSC) line were studied in prolonged culture in this system. We report that hPSCs cultured on membranes by indirect co-culture with fibroblasts were indistinguishable by multiple criteria from hPSCs cultured directly on a fibroblast feeder layer. Thus this co-culture system is a significant advance in hPSC culture methods, providing a facile stem cell expansion system with continuous medium conditioning while preventing mixing of hPSCs and feeder cells. This membrane culture method will enable testing of novel feeder cells and differentiation studies using co-culture with other cell types, and will simplify stepwise changes in culture conditions for staged differentiation protocols. (C) 2010 Elsevier Inc. All rights reserved.
17.Dec 2010, Molecular Systems Biology. 6, Abstract
Most of the phenotypes in nature are complex and are determined by many quantitative trait loci (QTLs). In this study we identify gene sets that contribute to one important complex trait: the ability of yeast cells to survive under alkali stress. We carried out an in-lab evolution (ILE) experiment, in which we grew yeast populations under increasing alkali stress to enrich for beneficial mutations. The populations acquired different sets of affecting alleles, showing that evolution can provide alternative solutions to the same challenge. We measured the contribution of each allele to the phenotype. The sum of the effects of the QTLs was larger than the difference between the ancestor phenotype and the evolved strains, suggesting epistatic interactions between the QTLs. In parallel, a clinical isolated strain was used to map natural QTLs affecting growth at high pH. In all, 17 candidate regions were found. Using a predictive algorithm based on the distances in protein-interaction networks, candidate genes were defined and validated by gene disruption. Many of the QTLs found by both methods are not directly implied in pH homeostasis but have more general, and often regulatory, roles. Molecular Systems Biology 6: 346; published online 16 February 2010; doi:10.1038/msb.2010.1
16.Dec 2010, Nature Protocols. 5, p. 303-322 Abstract[All authors]
a major challenge in the analysis of gene expression microarray data is to extract meaningful biological knowledge out of the huge volume of raw data. expander (EXPression ANalyzer and Displayer) is an integrated software platform for the analysis of gene expression data, which is freely available for academic use. It is designed to support all the stages of microarray data analysis, from raw data normalization to inference of transcriptional regulatory networks. the microarray analysis described in this protocol starts with importing the data into expander 5.0 and is followed by normalization and filtering. then, clustering and network based analyses are performed. the gene groups identified are tested for enrichment in function (based on Gene ontology), co-regulation (using transcription factor and microRNA target predictions) or co-location. the results of each analysis step can be visualized in a number of ways. the complete protocol can be executed in similar to 1 h.
15., Dec 2010, RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS. 6044, p. 578-579Discovering Transcriptional Modules by Combined Analysis of Expression Profiles and Regulatory Sequences
14.Dec 2009, Bioinformatics. 25, p. 1158-1164 Abstract
Motivation: Microarray-based gene expression studies have great potential but are frequently difficult to interpret due to their overwhelming dimensions. Recent studies have shown that the analysis of expression data can be improved by its integration with protein interaction networks, but the performance of these analyses has been hampered by the uneven quality of the interaction data. Results: We present Co-Expression Zone ANalysis using NEtworks (CEZANNE), a novel confidence-based method for extraction of functionally coherent co-expressed gene sets. CEZANNE uses probabilities for individual interactions, which can be computed by any available method. We propose a probabilistic model and a weighting scheme in which the likelihood of the connectivity of a subnetwork is related to the weight of its minimum cut. Applying CEZANNE to an expression dataset of DNA damage response in Saccharomyces cerevisiae, we recover both known and novel modules and predict novel protein functions. We show that CEZANNE outperforms previous methods for analysis of expression and interaction data.
13.Dec 2009, Nucleic Acids Research. 37, p. 1566-1579 Abstract
A major goal of system biology is the characterization of transcription factors and microRNAs (miRNAs) and the transcriptional programs they regulate. We present Allegro, a method for de-novo discovery of cis-regulatory transcriptional programs through joint analysis of genome-wide expression data and promoter or 3 UTR sequences. The algorithm uses a novel log-likelihood-based, non-parametric model to describe the expression pattern shared by a group of co-regulated genes. We show that Allegro is more accurate and sensitive than existing techniques, and can simultaneously analyze multiple expression datasets with more than 100 conditions. We apply Allegro on datasets from several species and report on the transcriptional modules it uncovers. Our analysis reveals a novel motif over-represented in the promoters of genes highly expressed in murine oocytes, and several new motifs related to fly development. Finally, using stem-cell expression profiles, we identify three miRNA families with pivotal roles in human embryogenesis.
12.Dec 2009, Genome Biology. 10, Abstract
Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae. However, these assays often fail to measure the genetic interactions among up to 40% of the studied gene pairs. Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions. We also present data on almost 190,000 novel interactions.
11.From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions
Recent technological breakthroughs allow the quantification of hundreds of thousands of genetic interactions (GIs) in Saccharomyces cerevisiae. The interpretation of these data is often difficult, but it can be improved by the joint analysis of GIs along with complementary data types. Here, we describe a novel methodology that integrates genetic and physical interaction data. We use our method to identify a collection of functional modules related to chromosomal biology and to investigate the relations among them. We show how the resulting map of modules provides clues for the elucidation of function both at the level of individual genes and at the level of functional modules.
10.Comprehensive MicroRNA profiling reveals a unique human embryonic stem cell signature dominated by a single seed sequence
Embryonic stem cells are unique among cultured cells in their ability to self-renew and differentiate into a wide diversity of cell types, suggesting that a specific molecular control network underlies these features. Human embryonic stem cells (hESCs) are known to have distinct mRNA expression, global DNA methylation, and chromatin profiles, but the involvement of high-level regulators, such as microRNAs (miRNA), in the hESC-specific molecular network is poorly understood. We report that global miRNA expression profiling of hESCs and a variety of stem cell and differentiated cell types using a novel microarray platform revealed a unique set of miRNAs differentially regulated in hESCs, including numerous miRNAs not previously linked to hESCs. These hESC-associated miRNAs were more likely to be located in large genomic clusters, and less likely to be located in introns of coding genes. hESCs had higher expression of oncogenic miRNAs and lower expression of tumor suppressor miRNAs than the other cell types. Many miRNAs upregulated in hESCs share a common consensus seed sequence, suggesting that there is cooperative regulation of a critical set of target miRNAs. We propose that miRNAs are coordinately controlled in hESCs, and are key regulators of pluripotence and differentiation.
9.Dec 2008, BMC Bioinformatics. 9, Abstract[All authors]
Background: Biological signaling pathways that govern cellular physiology form an intricate web of tightly regulated interlocking processes. Data on these regulatory networks are accumulating at an unprecedented pace. The assimilation, visualization and interpretation of these data have become a major challenge in biological research, and once met, will greatly boost our ability to understand cell functioning on a systems level. Results: To cope with this challenge, we are developing the SPIKE knowledge-base of signaling pathways. SPIKE contains three main software components: 1) A database (DB) of biological signaling pathways. Carefully curated information from the literature and data from large public sources constitute distinct tiers of the DB. 2) A visualization package that allows interactive graphic representations of regulatory interactions stored in the DB and superposition of functional genomic and proteomic data on the maps. 3) An algorithmic inference engine that analyzes the networks for novel functional interplays between network components. SPIKE is designed and implemented as a community tool and therefore provides a user-friendly interface that allows registered users to upload data to SPIKE DB. Our vision is that the DB will be populated by a distributed and highly collaborative effort undertaken by multiple groups in the research community, where each group contributes data in its field of expertise. Conclusion: The integrated capabilities of SPIKE make it a powerful platform for the analysis of signaling networks and the integration of knowledge on such networks with omics data.
8.MetaReg: a platform for modeling, analysis and visualization of biological systems using large-scale experimental data
MetaReg http://acgt.cs.tau.ac.il/metareg/application.html is a computational tool that models cellular networks and integrates experimental results with such models. MetaReg represents established knowledge about a biological system, available today mostly in informal form in the literature, as probabilistic network models with underlying combinatorial regulatory logic. MetaReg enables contrasting predictions with measurements, model improvements and studying what-if scenarios. By summarizing prior knowledge and providing visual and computational aids, it helps the expert explore and understand her system better.
7., Dec 2008, RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS. 4955, p. 347-359 AbstractDetecting disease-specific dysregulated pathways via analysis of clinical expression profiles
We present a method for identifying connected gene subnetworks significantly enriched for genes that are dysregulated in specimens of a disease. These subnetworks provide a signature of the disease potentially useful for diagnosis, pinpoint possible pathways affected by the disease, and suggest targets for drug intervention. Our method uses microarray gene expression profiles derived in clinical case-control studies to identify genes significantly dysregulated in disease specimens, combined with protein interaction data to identify connected sets of genes. Our core algorithm searches for minimal connected subnetworks in which the number of dysregulated genes in each diseased sample exceeds a given threshold. We have applied the method in a study of Huntington's disease caudate nucleus expression profiles and in a meta-analysis of breast cancer studies. In both cases the results were statistically significant and appeared to home in on compact pathways enriched with hallmarks of the diseases.
6.Dec 2008, Nature. 455, p. 401-U55 Abstract[All authors]
Stem cells are defined as self- renewing cell populations that can differentiate into multiple distinct cell types. However, hundreds of different human cell lines from embryonic, fetal and adult sources have been called stem cells, even though they range from pluripotent cells - typified by embryonic stem cells, which are capable of virtually unlimited proliferation and differentiation - to adult stem cell lines, which can generate a far more limited repertoire of differentiated cell types. The rapid increase in reports of new sources of stem cells and their anticipated value to regenerative medicine(1,2) has highlighted the need for a general, reproducible method for classification of these cells(3). We report here the creation and analysis of a database of global gene expression profiles ( which we call the 'stem cell matrix') that enables the classification of cultured human stem cells in the context of a wide variety of pluripotent, multipotent and differentiated cell types. Using an unsupervised clustering method(4,5) to categorize a collection of similar to 150 cell samples, we discovered that pluripotent stem cell lines group together, whereas other cell types, including brain-derived neural stem cell lines, are very diverse. Using further bioinformatic analysis(6) we uncovered a protein - protein network (PluriNet) that is shared by the pluripotent cells ( embryonic stem cells, embryonal carcinomas and induced pluripotent cells). Analysis of published data showed that the PluriNet seems to be a common characteristic of pluripotent cells, including mouse embryonic stem and induced pluripotent cells and human oocytes. Our results offer a new strategy for classifying stem cells and support the idea that pluripotency and self- renewal are under tight control by specific molecular networks.
5.Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks
The biological interpretation of genetic interactions is a major challenge. Recently, Kelley and Ideker proposed a method to analyze together genetic and physical networks, which explains many of the known genetic interactions as linking different pathways in the physical network. Here, we extend this method and devise novel analytic tools for interpreting genetic interactions in a physical context. Applying these tools on a large-scale Saccharomyces cerevisiae data set, our analysis reveals 140 between-pathway models that explain 3765 genetic interactions, roughly doubling those that were previously explained. Model genes tend to have short mRNA half-lives and many phosphorylation sites, suggesting that their stringent regulation is linked to pathway redundancy. We also identify 'pivot' proteins that have many physical interactions with both pathways in our models, and show that pivots tend to be essential and highly conserved. Our analysis of models and pivots sheds light on the organization of the cellular machinery as well as on the roles of individual proteins.
4.Dec 2007, Molecular Systems Biology. 3, Abstract
Functional annotation of proteins is a fundamental problem in the post-genomic era. The recent availability of protein interaction networks for many model species has spurred on the development of computational methods for interpreting such data in order to elucidate protein function. In this review, we describe the current computational approaches for the task, including direct methods, which propagate functional information through the network, and module-assisted methods, which infer functional modules within the network and use those for the annotation task. Although a broad variety of interesting approaches has been developed, further progress in the field will depend on systematic evaluation of the methods and their dissemination in the biological community.
3.Dec 2007, BMC Systems Biology. 1, Abstract
Background: With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data. Results: We describe a novel algorithmic framework for this challenge. We first transform the high-throughput data into similarity values, (e. g., by computing pairwise similarity of gene expression patterns from microarray data). Then, given a network of genes or proteins and similarity values between some of them, we seek connected sub-networks (or modules) that manifest high similarity. We develop algorithms for this problem and evaluate their performance on the osmotic shock response network in S. cerevisiae and on the human cell cycle network. We demonstrate that focused, biologically meaningful and relevant functional modules are obtained. In comparison with extant algorithms, our approach has higher sensitivity and higher specificity. Conclusion: We have demonstrated that our method can accurately identify functional modules. Hence, it carries the promise to be highly useful in analysis of high throughput data.
2.Dec 2006, Journal of Computational Biology. 13, p. 336-350 Abstract
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth." To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.
1., Dec 2005, RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS. 3500, p. 283-295 AbstractInformation theoretic approaches to whole genome phylogenies
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes. The core of our method is a new measure of pairwise distances between sequences, whose lengths may greatly vary. This measure is based on information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. The algorithm uses suffix arrays to compute the distance of two l long sequences in OM time. It is fast enough to enable the construction of the phylogenomic tree for hundreds of species, and the phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic truth". To assess our approach, it was implemented together with a number of alternative approaches, including two that were previously published in the literature. Comparing their outcome to ours, using a "traditional" tree and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin.