11.Context-specific effects of sequence elements on subcellular localization of linear and circular RNAs
Long RNAs vary extensively in their post-transcriptional fates, and this variation is attributed in part to short sequence elements. We used massively parallel RNA assays to study how sequences derived from noncoding RNAs influence the subcellular localization and stability of circular and linear RNAs, including spliced and unspliced forms. We find that the effects of sequence elements strongly depend on the host RNA context, with limited overlap between sequences that drive nuclear enrichment of linear and circular RNAs. Binding of specific RNA binding proteins underpins some of these differences—SRSF1 binding leads to nuclear enrichment of circular RNAs; SAFB binding is associated with nuclear enrichment of predominantly unspliced linear RNAs; and IGF2BP1 promotes export of linear spliced RNA molecules. The post-transcriptional fate of long RNAs is thus dictated by combinatorial contributions of specific sequence elements, of splicing, and of the presence of the terminal features unique to linear RNAs.
10.Apr 2022, Molecular Systems Biology. 4, 18, p. e10682 Abstract
The synthesis of RNA polymerase II (Pol2) products, which include messenger RNAs or long noncoding RNAs, culminates in transcription termination. How the transcriptional termination of a gene impacts the activity of promoters found immediately downstream of it, and which can be subject to potential transcriptional interference, remains largely unknown. We examined in an unbiased manner the features of the intergenic regions between pairs of ‘tandem genes’—closely spaced (< 2 kb) human genes found on the same strand. Intergenic regions separating tandem genes are enriched with guanines and are characterized by binding of several proteins, including AGO1 and AGO2 of the RNA interference pathway. Additionally, we found that Pol2 is particularly enriched in this region, and it is lost upon perturbations affecting splicing or transcriptional elongation. Perturbations of genes involved in Pol2 pausing and R loop biology preferentially affect expression of downstream genes in tandem gene pairs. Overall, we find that features associated with Pol2 pausing and accumulation rather than those associated with avoidance of transcriptional interference are the predominant driving force shaping short tandem intergenic regions.
9.May 2021, The EMBO Journal. Abstract
The functions of long RNAs, including mRNAs and long noncoding RNAs (lncRNAs), critically depend on their subcellular localization. The identity of the sequences that dictate subcellular localization and their high-resolution anatomy remain largely unknown. We used a suite of massively parallel RNA assays and libraries containing thousands of sequence variants to pinpoint the functional features within the SIRLOIN element, which dictates nuclear enrichment through hnRNPK recruitment. In addition, we profiled the endogenous SIRLOIN RNA-nucleoprotein complex and identified the nuclear RNA-binding proteins SLTM and SNRNP70 as novel SIRLOIN binders. Taken together, using massively parallel assays, we identified the features that dictate binding of hnRNPK, SLTM, and SNRNP70 to SIRLOIN and found that these factors are jointly required for SIRLOIN activity. Our study thus provides a roadmap for high-throughput dissection of functional sequence elements in long RNAs.
8.Dec 2020, Genome Biology. 1, 22, p. 29 Abstract
Animal genomes contain thousands of long noncoding RNA (lncRNA) genes, a growing subset of which are thought to be functionally important. This functionality is often mediated by short sequence elements scattered throughout the RNA sequence that correspond to binding sites for small RNAs and RNA binding proteins. Throughout vertebrate evolution, the sequences of lncRNA genes changed extensively, so that it is often impossible to obtain significant alignments between sequences of lncRNAs from evolutionary distant species, even when synteny is evident. This often prohibits identifying conserved lncRNAs that are likely to be functional or prioritizing constrained regions for experimental interrogation.
We introduce here LncLOOM, a novel algorithmic framework for the discovery and evaluation of syntenic combinations of short motifs. LncLOOM is based on a graph representation of the input sequences and uses integer linear programming to efficiently compare dozens of sequences that have thousands of bases each and to evaluate the significance of the recovered motifs. We show that LncLOOM is capable of identifying specific, biologically relevant motifs which are conserved throughout vertebrates and beyond in lncRNAs and 3′UTRs, including novel functional RNA elements in the CHASERR lncRNA that are required for regulation of CHD2 expression.
We expect that LncLOOM will become a broadly used approach for the discovery of functionally relevant elements in the noncoding genome.
7.Sep 2020, EMBO Reports. 11, 21, p. e51264 Abstract
Mammalian genomes encode thousands of long noncoding RNAs (lncRNAs), yet the biological functions of most of them remainunknown. A particularly rich repertoire of lncRNAs is found inmammalian brain and in the early embryo. We used RNA-seq andcomputational analysis to prioritize lncRNAs that may regulatecommitment of pluripotent cells to a neuronal fate and perturbedtheir expression prior to neuronal differentiation. Knockdown byRNAi of two highly conserved and well-expressed lncRNAs, Reno1 (2810410L24Rik) and lnc-Nr2f1, decreased the expression ofneuronal markers and led to massive changes in gene expressionin the differentiated cells. We further show that the Reno1 locusforms increasing spatial contacts during neurogenesis with itsadjacent protein-coding gene Bahcc1. Loss of either Reno1 or Bahcc1 leads to an early arrest in neuronal commitment, failure toinduce a neuronal gene expression program, and to global reduc-tion in chromatin accessibility at regions that are marked by the H3K4me3 chromatin mark at the onset of differentiation. Reno1 and Bahcc1 thus form a previously uncharacterized circuit requiredfor the early steps of neuronal commitment.
6.Gene architecture and sequence composition underpin selective dependency of long RNAs on components of the nuclear export pathway
The nuclear export pathway transports long RNAs produced in the nucleus to the cytoplasm. The core components of this pathway are thought to be required for export of virtually all polyadenylated RNAs. Here, we depleted different proteins that act in nuclear export in human cells, and quantified the transcriptome-wide consequences on RNA localization. Different genes exhibited substantially variable sensitivities, with depletion of NXF1 and TREX components causing some transcripts to become strongly retained in the nucleus while others were not affected. Specifically, NXF1 is preferentially required for export of single- or few-exon transcripts with long exons or high A/U-content, whereas depletion of TREX complex components preferentially affects spliced and G/C-rich transcripts. Using massively parallel reporter assays we identified short sequence elements that render transcripts dependent on NXF1 for their export, and identified synergistic effects of splicing and NXF1. These results revise the current model of how nuclear export shapes the distribution of RNA within human cells.
5.Feb 2020, Nature Reviews Genetics. 21, 2, p. 102-117 Abstract
Long non-coding RNAs (lncRNAs) are diverse transcription products emanating from thousands of loci in mammalian genomes. Cis-acting lncRNAs, which constitute a substantial fraction of lncRNAs with an attributed function, regulate gene expression in a manner dependent on the location of their own sites of transcription, at varying distances from their targets in the linear genome. Through various mechanisms, cis-acting lncRNAs have been demonstrated to activate, repress or otherwise modulate the expression of target genes. We discuss the activities that have been ascribed to cis-acting lncRNAs, the evidence and hypotheses regarding their modes of action, and the methodological advances that enable their identification and characterization. The emerging principles highlight lncRNAs as transcriptional units highly adept at contributing to gene regulatory networks and to the generation of fine-tuned spatial and temporal gene expression programmes.
4.Nov 2019, Nature Communications. 10, 1, p. 5092 Abstract
Chromodomain helicase DNA binding protein 2 (Chd2) is a chromatin remodeller implicated in neurological disease. Here we show that Chaserr, a highly conserved long noncoding RNA transcribed from a region near the transcription start site of Chd2 and on the same strand, acts in concert with the CHD2 protein to maintain proper Chd2 expression levels. Loss of Chaserr in mice leads to early postnatal lethality in homozygous mice, and severe growth retardation in heterozygotes. Mechanistically, loss of Chaserr leads to substantially increased Chd2 mRNA and protein levels, which in turn lead to transcriptional interference by inhibiting promoters found downstream of highly expressed genes. We further show that Chaserr production represses Chd2 expression solely in cis, and that the phenotypic consequences of Chaserr loss are rescued when Chd2 is perturbed as well. Targeting Chaserr is thus a potential strategy for increasing CHD2 levels in haploinsufficient individuals.
3.Oct 2018, Molecular Cell. 72, 3, p. 553-567.E5 Abstract
In mammals, neurons in the peripheral nervous system (PNS) have regenerative capacity following injury, but it is generally absent in the CNS. This difference is attributed, at least in part, to the intrinsic ability of PNS neurons to activate a unique regenerative transcriptional program following injury. Here, we profiled gene expression following sciatic nerve crush in mice and identified long noncoding RNAs (lncRNAs) that act in the regenerating neurons and which are typically not expressed in other contexts. We show that two of these lncRNAs regulate the extent of neuronal outgrowth. We then focus on one of these, Silc1, and show that it regulates neuroregeneration in cultured cells and in vivo, through cis-acting activation of the transcription factor Sox11.
2.Mar 2018, Nature. 555, 7694, p. 107-111 Abstract
Long noncoding RNAs (lncRNAs) are emerging as key parts of multiple cellular pathways, but their modes of action and how these are dictated by sequence remain unclear. lncRNAs tend to be enriched in the nuclear fraction, whereas most mRNAs are overtly cytoplasmic, although several studies have found that hundreds of mRNAs in various cell types are retained in the nucleus. It is thus conceivable that some mechanisms that promote nuclear enrichment are shared between lncRNAs and mRNAs. Here, to identify elements in lncRNAs and mRNAs that can force nuclear localization, we screened libraries of short fragments tiled across nuclear RNAs, which were cloned into the untranslated regions of an efficiently exported mRNA. The screen identified a short sequence derived from Alu elements and bound by HNRNPK that increased nuclear accumulation. Binding of HNRNPK to C-rich motifs outside Alu elements is also associated with nuclear enrichment in both lncRNAs and mRNAs, and this mechanism is conserved across species. Our results thus identify a pathway for regulation of RNA accumulation and subcellular localization that has been co-opted to regulate the fate of transcripts with integrated Alu elements.
1.Principles of Long Noncoding RNA Evolution Derived from Direct Comparison of Transcriptomes in 17 Species
The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNAseq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, > 70% of lincRNAs cannot be traced to homologs in species that diverged > 50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 50-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture.