Publications | Schwartz Lab

2024

Timing is everything: When is m6A deposited?

Dierks D. & Schwartz S. (2024) Molecular Cell. 84, 19, p. 3572-3573

txtools: an R package facilitating analysis of RNA modifications, structures, and interactions

Garcia-Campos M. A. & Schwartz S. (2024) Nucleic Acids Research. 52, 8, e42.

We present txtools, an R package that enables the processing, analysis, and visualization of RNA-seq data at the nucleotide-level resolution, seamlessly integrating alignments to the genome with transcriptomic representation. txtools' main inputs are BAM files and a transcriptome annotation, and the main output is a table, capturing mismatches, deletions, and the number of reads beginning and ending at each nucleotide in the transcriptomic space. txtools further facilitates downstream visualization and analyses. We showcase, using examples from the epitranscriptomic field, how a few calls to txtools functions can yield insightful and ready-to-publish results. txtools is of broad utility also in the context of structural mapping and RNA:protein interaction mapping. By providing a simple and intuitive framework, we believe that txtools will be a useful and convenient tool and pave the path for future discovery. txtools is available for installation from its GitHub repository at https://github.com/AngelCampos/txtools.

No evidence for ac4C within human mRNA upon data reassessment

Georgeson J. & Schwartz S. (2024) Molecular Cell. 84, 8, p. 1601-1610.e2

Cytidine acetylation (ac4C) of RNA is a post-transcriptional modification catalyzed by Nat10. Recently, an approach termed RedaC:T was employed to map ac4C in human mRNA, relying on detection of C>T mutations in WT but not in Nat10-KO cells. RedaC:T suggested widespread ac4C presence. Here, we reanalyze RedaC:T data. We find that mismatch signatures are not reproducible, as C>T mismatches are nearly exclusively present in only one of two biological replicates. Furthermore, all mismatch typesnot only C>Tare highly enriched in WT samples, inconsistent with an acetylation signature. We demonstrate that the originally observed enrichment in mutations in one of the WT samples is due to its low complexity, resulting in the technical amplification of all classes of mismatch counts. Removal of duplicate reads abolishes the skewed mismatch patterns. These analyses account for the irreproducible mismatch patterns across samples while failing to find evidence for acetylation of RedaC:T sites.

Dissecting the sequence and structural determinants guiding m6A deposition and evolution via inter- and intra-species hybrids

Shachar R., Dierks D., Garcia-Campos M. A., Uzonyi A., Toth U., Rossmanith W. & Schwartz S. (2024) Genome Biology. 25, 48.

Background: N6-methyladenosine (m6A) is the most abundant mRNA modification, and controls mRNA stability. m6A distribution varies considerably between and within species. Yet, it is unclear to what extent this variability is driven by changes in genetic sequences (cis) or cellular environments (trans) and via which mechanisms. Results: Here we dissect the determinants governing RNA methylation via interspecies and intraspecies hybrids in yeast and mammalian systems, coupled with massively parallel reporter assays and m6A-QTL reanalysis. We find that m6A evolution and variability is driven primarily in cis, via two mechanisms: (1) variations altering m6A consensus motifs, and (2) variation impacting mRNA secondary structure. We establish that mutations impacting RNA structure - even when distant from an m6A consensus motif - causally dictate methylation propensity. Finally, we demonstrate that allele-specific differences in m6A levels lead to allele-specific changes in gene expression. Conclusions: Our findings define the determinants governing m6A evolution and diversity and characterize the consequences thereof on gene expression regulation.

Directing RNA-modifying machineries towards endogenous RNAs: opportunities and challenges

Witzenberger M. & Schwartz S. (2024) Trends in Genetics. 40, 4, p. 313-325

Over 170 chemical modifications can be naturally installed on RNA, all of which are catalyzed by dedicated machineries. These modifications can alter RNA sequence structure, stability, and translation as well as serving as quality control marks that record aspects of RNA processing. The diverse roles played by RNAs within cells has motivated endeavors to exogenously introduce RNA modifications at target sites for diverse purposes ranging from recording RNA:protein interactions to therapeutic applications. Here, we discuss these applications and the approaches that have been employed to engineer RNA-modifying machineries, and highlight persisting challenges and perspectives.

2023

Dissecting the basis for differential substrate specificity of ADAR1 and ADAR2

Zambrano-Mila M. S., Witzenberger M., Rosenwasser Z., Uzonyi A., Nir R., Ben-Aroya S., Levanon E. Y. & Schwartz S. (2023) Nature Communications. 14, 1, 8212.

Millions of adenosines are deaminated throughout the transcriptome by ADAR1 and/or ADAR2 at varying levels, raising the question of what are the determinants guiding substrate specificity and how these differ between the two enzymes. We monitor how secondary structure modulates ADAR2 vs ADAR1 substrate selectivity, on the basis of systematic probing of thousands of synthetic sequences transfected into cell lines expressing exclusively ADAR1 or ADAR2. Both enzymes induce symmetric, strand-specific editing, yet with distinct offsets with respect to structural disruptions: −26 nt for ADAR2 and −35 nt for ADAR1. We unravel the basis for these differences in offsets through mutants, domain-swaps, and ADAR homologs, and find it to be encoded by the differential RNA binding domain (RBD) architecture. Finally, we demonstrate that this offset-enhanced editing can allow an improved design of ADAR2-recruiting therapeutics, with proof-of-concept experiments demonstrating increased on-target and potentially decreased off-target editing.

A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei

Rajan K. S., Madmoni H., Bashan A., Taoka M., Aryal S., Nobe Y., Doniger T., Galili Kostin B., Blumberg A., Cohen-Chalamish S., Schwartz S., Rivalta A., Zimmerman E., Unger R., Isobe T., Yonath A. & Michaeli S. (2023) Nature Communications. 14, 1, 7462.

Trypanosomes are protozoan parasites that cycle between insect and mammalian hosts and are the causative agent of sleeping sickness. Here, we describe the changes of pseudouridine (Ψ) modification on rRNA in the two life stages of the parasite using four different genome-wide approaches. CRISPR-Cas9 knock-outs of all four snoRNAs guiding Ψ on helix 69 (H69) of the large rRNA subunit were lethal. A single knock-out of a snoRNA guiding Ψ530 on H69 altered the composition of the 80S monosome. These changes specifically affected the translation of only a subset of proteins. This study correlates a single site Ψ modification with changes in ribosomal protein stoichiometry, supported by a high-resolution cryo-EM structure. We propose that alteration in rRNA modifications could generate ribosomes preferentially translating state-beneficial proteins.

Comprehensive mapping of exon junction complex binding sites reveals universal EJC deposition in Drosophila

Morillo L., Paternina T., Alasseur Q., Genovesio A., Schwartz S. & Le Hir H. (2023) BMC Biology. 21, 1, 246.

Background: The exon junction complex (EJC) is involved in most steps of the mRNA life cycle, ranging from splicing to nonsense-mediated mRNA decay (NMD). It is assembled by the splicing machinery onto mRNA in a sequence-independent manner. A fundamental open question is whether the EJC is deposited onto all exonexon junctions or only on a subset of them. Several previous studies have made observations supportive of the latter, yet these have been limited by methodological constraints. Results: In this study, we sought to overcome these limitations via the integration of two different approaches for transcriptome-wide mapping of EJCs. Our results revealed that nearly all, if not all, internal exons consistently harbor an EJC in Drosophila, demonstrating that EJC presence is an inherent consequence of the splicing reaction. Furthermore, our study underscores the limitations of eCLIP methods in fully elucidating the landscape of RBP binding sites. Our findings highlight how highly specific (low false positive) methodologies can lead to erroneous interpretations due to partial sensitivity (high false negatives). Conclusions: This study contributes to our understanding of EJC deposition and its association with pre-mRNA splicing. The universal presence of EJC on internal exons underscores its significance in ensuring proper mRNA processing. Additionally, our observations highlight the need to consider both specificity and sensitivity in RBP mapping methodologies.

The yeast RNA methylation complex consists of conserved yet reconfigured components with m6A-dependent and independent roles

Ensinck I., Maman A., Albihlal W. S., Lassandro M., Salzano G., Sideri T., Howell S. A., Calvani E., Patel H., Bushkin G., Ralser M., Snijders A. P., Skehel M., Casañal A., Schwartz S. & van Werven F. J. (2023) eLife. 12, RP87860.

N6-methyladenosine (m6A), the most abundant mRNA modification, is deposited in mammals/insects/plants by m6A methyltransferase complexes (MTC) comprising a catalytic subunit and at least five additional proteins. The yeast MTC is critical for meiosis and was known to comprise three proteins, of which two were conserved. We uncover three novel MTC components (Kar4/Ygl036w-Vir1/Dyn2). All MTC subunits, except for Dyn2, are essential for m6A deposition and have corresponding mammalian MTC orthologues. Unlike the mammalian bipartite MTC, the yeast MTC is unipartite, yet multifunctional. The mRNA interacting module, comprising Ime4, Mum2, Vir1, and Kar4, exerts the MTC's m6A-independent function, while Slz1 enables the MTC catalytic function in m6A deposition. Both functions are critical for meiotic progression. Kar4 also has a mechanistically separate role from the MTC during mating. The yeast MTC constituents play distinguishable m6A-dependent, MTC-dependent, and MTC-independent functions, highlighting their complexity and paving the path towards dissecting multi-layered MTC functions in mammals.

Exclusion of m6A from splice-site proximal regions by the exon junction complex dictates m6A topologies and mRNA stability

Uzonyi A., Dierks D., Nir R., Kwon O. S., Toth U., Barbosa I., Burel C., Brandis A., Rossmanith W., Le Hir H., Slobodin B. & Schwartz S. (2023) Molecular Cell. 83, 2, p. 237-251.e7

N6-methyladenosine (m6A), a widespread destabilizing mark on mRNA, is non-uniformly distributed across the transcriptome, yet the basis for its selective deposition is unknown. Here, we propose that m6A deposition is not selective. Instead, it is exclusion based: m6A consensus motifs are methylated by default, unless they are within a window of ∼100 nt from a splice junction. A simple model which we extensively validate, relying exclusively on presence of m6A motifs and exon-intron architecture, allows in silico recapitulation of experimentally measured m6A profiles. We provide evidence that exclusion from splice junctions is mediated by the exon junction complex (EJC), potentially via physical occlusion, and that previously observed associations between exon-intron architecture and mRNA decay are mechanistically mediated via m6A. Our findings establish a mechanism coupling nuclear mRNA splicing and packaging with the covalent installation of m6A, in turn controlling cytoplasmic decay.

2022

A late-stage assembly checkpoint of the human mitochondrial ribosome large subunit

Rebelo-Guiomar P., Pellegrino S., Dent K. C., Sas-Chen A., Miller-Fleming L., Garone C., Van Haute L., Rogan J. F., Dinan A., Firth A. E., Andrews B., Whitworth A. J., Schwartz S., Warren A. J. & Minczuk M. (2022) Nature Communications. 13, 1, 929.

Many cellular processes, including ribosome biogenesis, are regulated through post-transcriptional RNA modifications. Here, a genome-wide analysis of the human mitochondrial transcriptome shows that 2-O-methylation is limited to residues of the mitoribosomal large subunit (mtLSU) 16S mt-rRNA, introduced by MRM1, MRM2 and MRM3, with the modifications installed by the latter two proteins being interdependent. MRM2 controls mitochondrial respiration by regulating mitoribosome biogenesis. In its absence, mtLSU particles (visualized by cryo-EM at the resolution of 2.6 Å) present disordered RNA domains, partial occupancy of bL36m and bound MALSU1:L0R8F8:mtACP anti-association module, allowing five mtLSU biogenesis intermediates with different intersubunit interface configurations to be placed along the assembly pathway. However, mitoribosome biogenesis does not depend on the methyltransferase activity of MRM2. Disruption of the MRM2 Drosophila melanogaster orthologue leads to mitochondria-related developmental arrest. This work identifies a key checkpoint during mtLSU assembly, essential to maintain mitochondrial homeostasis.

Antisense pairing and SNORD13 structure guide RNA cytidine acetylation

Thalalla Gamage S., Bortolin-Cavaillé M., Link C., Bryson K., Sas-Chen A., Schwartz S., Cavaillé J. & Meier J. L. (2022) RNA. 28, 12, p. 1582-1596

N4-acetylcytidine (ac⁴C) is an RNA nucleobase found in all domains of life. The establishment of ac⁴C in helix 45 (h45) of human 18S ribosomal RNA (rRNA) requires the combined activity of the acetyltransferase NAT10 and the box C/D snoRNA SNORD13. However, the molecular mechanisms governing RNA-guided nucleobase acetylation in humans remain unexplored. After applying comparative sequence analysis and site-directed mutagenesis to provide evidence that SNORD13 folds into three main RNA helices, we report two assays that enable the study of SNORD13-dependent RNA acetylation in human cells. First, we demonstrate that ectopic expression of SNORD13 rescues h45 in a SNORD13 knockout cell line. Next, we show that mutant snoRNAs can be used in combination with nucleotide resolution ac⁴C sequencing to define structure and sequence elements critical for SNORD13 function. Finally, we develop a second method that reports on the substrate specificity of endogenous NAT10SNORD13 via mutational analysis of an ectopically expressed pre-rRNA substrate. By combining mutational analysis of these reconstituted systems with nucleotide resolution ac⁴C sequencing, our studies reveal plasticity in the molecular determinants underlying RNA-guided cytidine acetylation that is distinct from deposition of other well-studied rRNA modifications (e.g., pseudouridine). Overall, our studies provide a new approach to reconstitute RNA-guided cytidine acetylation in human cells as well as nucleotide resolution insights into the mechanisms governing this process.

m6A is required for resolving progenitor identity during planarian stem cell differentiation

Dagan Y., Yesharim Y., Bonneau A. R., Frankovits T., Schwartz S., Reddien P. W. & Wurtzel O. (2022) EMBO Journal. 41, 21, e109895.

Regeneration and tissue homeostasis require accurate production of missing cell lineages. Cell production is driven by changes to gene expression, which is shaped by multiple layers of regulation. Here, we find that the ubiquitous mRNA base-modification, m6A, is required for proper cell fate choice and cellular maturation in planarian stem cells (neoblasts). We mapped m6A-enriched regions in 7,600 planarian genes and found that perturbation of the m6A pathway resulted in progressive deterioration of tissues and death. Using single-cell RNA sequencing of >20,000 cells following perturbation of the m6A pathway, we identified an increase in expression of noncanonical histone variants, and that inhibition of the pathway resulted in accumulation of undifferentiated cells throughout the animal in an abnormal transcriptional state. Analysis of >1,000 planarian gene expression datasets revealed that the inhibition of the chromatin modifying complex NuRD had almost indistinguishable consequences, unraveling an unappreciated link between m6A and chromatin modifications. Our findings reveal that m6A is critical for planarian stem cell homeostasis and gene regulation in tissue maintenance and regeneration.

IRF3 inhibits IFN-γ-mediated restriction of intracellular pathogens in macrophages independently of IFNAR

Maciag K., Raychowdhury R., Smith K., Schneider A. M., Coers J., Mumbach M. R., Schwartz S. & Hacohen N. (2022) Journal of Leukocyte Biology. 112, 2, p. 257-271

Macrophages use an array of innate immune sensors to detect intracellular pathogens and to tailor effective antimicrobial responses. In addition, extrinsic activation with the cytokine IFN-γ is often required as well to tip the scales of the host-pathogen balance toward pathogen restriction. However, little is known about how host-pathogen sensing impacts the antimicrobial IFN-γ-activated state. It was observed that in the absence of IRF3, a key downstream component of pathogen sensing pathways, IFN-γ-primed macrophages more efficiently restricted the intracellular bacterium Legionella pneumophila and the intracellular protozoan parasite Trypanosoma cruzi. This effect did not require IFNAR, the receptor for Type I IFNs known to be induced by IRF3, nor the sensing adaptors MyD88/TRIF, MAVS, or STING. This effect also did not involve differential activation of STAT1, the major signaling protein downstream of both Type 1 and Type 2 IFN receptors. IRF3-deficient macrophages displayed a significantly altered IFN-γ-induced gene expression program, with up-regulation of microbial restriction factors such as Nos2. Finally, we found that IFN-γ-primed but not unprimed macrophages largely excluded the activated form of IRF3 from the nucleus following bacterial infection. These data are consistent with a relationship of mutual inhibition between IRF3 and IFN-γ-activated programs, possibly as a component of a partially reversible mechanism for modulating the activity of potent innate immune effectors (such as Nos2) in the context of intracellular infection.

Probing small ribosomal subunit RNA helix 45 acetylation across eukaryotic evolution

Bortolin-Cavaillé M. L., Quillien A., Gamage S. T., Thomas J. M., Sas-Chen A., Sharma S., Plisson-Chastang C., Vandel L., Blader P., Lafontaine D. L., Schwartz S., Meier J. L. & Cavaillé J. (2022) Nucleic Acids Research. 50, 11, p. 6284-6299

NAT10 is an essential enzyme that catalyzes N⁴acetylcytidine (ac⁴C) in eukaryotic transfer RNA and 18S ribosomal RNA. Recent studies suggested that rRNA acetylation is dependent on SNORD13, a box C/D small nucleolar RNA predicted to base-pair with 18S rRNA via two antisense elements. However, the selectivity of SNORD13-dependent cytidine acetylation and its relationship to NAT10's essential function remain to be defined. Here, we demonstrate that SNORD13 is required for acetylation of a single cytidine of human and zebrafish 18S rRNA. In-depth characterization revealed that SNORD13-dependent ac⁴C is dispensable for human cell growth, ribosome biogenesis, translation and development. This loss of function analysis inspired a cross-evolutionary survey of the eukaryotic rRNA acetylation 'machinery' that led to the characterization of many novel metazoan SNORD13 genes. This includes an atypical SNORD13-like RNA in Drosophila melanogaster which guides ac⁴C to 18S rRNA helix 45 despite lacking one of the two rRNA antisense elements. Finally, we discover that Caenorhabditis elegans 18S rRNA is not acetylated despite the presence of an essential NAT10 homolog. Our findings shed light on the molecular mechanisms underlying SNORD13-mediated rRNA acetylation across eukaryotic evolution and raise new questions regarding the biological and evolutionary relevance of this highly conserved rRNA modification.

A systematic dissection of determinants and consequences of snoRNA-guided pseudouridylation of human mRNA

Nir R., Hoernes T. P., Muramatsu H., Faserl K., Karikó K., Erlacher M. D., Sas-Chen A. & Schwartz S. (2022) Nucleic Acids Research. 50, 9, p. 4900-4916

RNA can be extensively modified posttranscriptionally with >170 covalent modifications, expanding its functional and structural repertoire. Pseudouridine (Ψ), the most abundant modified nucleoside in rRNA and tRNA, has recently been found within mRNA molecules. It remains unclear whether pseudouridylation of mRNA can be snoRNA-guided, bearing important implications for understanding the physiological target spectrum of snoRNAs and for their potential therapeutic exploitation in genetic diseases. Here, using a massively parallel reporter based strategy we simultaneously interrogate Ψ levels across hundreds of synthetic constructs with predesigned complementarity against endogenous snoRNAs. Our results demonstrate that snoRNA-mediated pseudouridylation can occur on mRNA targets. However, this is typically achieved at relatively low efficiencies, and is constrained by mRNA localization, snoRNA expression levels and the length of the snoRNA:mRNA complementarity stretches. We exploited these insights for the design of snoRNAs targeting pseudouridylation at premature termination codons, which was previously shown to suppress translational termination. However, in this and follow-up experiments in human cells we observe no evidence for significant levels of readthrough of pseudouridylated stop codons. Our study enhances our understanding of the scope, 'design rules', constraints and consequences of snoRNA-mediated pseudouridylation.

Cloning of DNA oligo pools for in vitro expression

Uzonyi A., Nir R. & Schwartz S. (2022) STAR Protocols. 3, 1, 101103.

Oligo library pools are powerful tools for systematic investigation of genetic and transcriptomic machinery such as promoter function and gene regulation, non-coding RNAs, or RNA modifications. Here, we provide a detailed protocol for cloning DNA oligo pools made up of tens of thousands of different constructs, aiming to preserve the complexity of the pools. This system would be suitable for expression in cell lines and can be followed up by next-generation sequencing analysis.

2021

Decoupling of degradation from deadenylation reshapes poly(A) tail length in yeast meiosis

Wiener D., Antebi Y. & Schwartz S. (2021) Nature Structural & Molecular Biology. 28, 12, p. 1038-1049

Accepted version

Nascent messenger RNA is endowed with a poly(A) tail that is subject to gradual deadenylation and subsequent degradation in the cytoplasm. Deadenylation and degradation rates are typically correlated, rendering it difficult to dissect the determinants governing each of these processes and the mechanistic basis of their coupling. Here we developed an approach that allows systematic, robust and multiplexed quantification of poly(A) tails in Saccharomyces cerevisiae. Our results suggest that mRNA deadenylation and degradation rates are decoupled during meiosis, and that transcript length is a major determinant of deadenylation rates and a key contributor to reshaping of poly(A) tail lengths. Meiosis-specific decoupling also leads to unique positive associations between poly(A) tail length and gene expression. The decoupling is associated with a focal localization pattern of the RNA degradation factor Xrn1, and can be phenocopied by Xrn1 deletion under nonmeiotic conditions. Importantly, the association of transcript length with deadenylation rates is conserved across eukaryotes. Our study uncovers a factor that shapes deadenylation rate and reveals a unique context in which degradation is decoupled from deadenylation.

The ribosome epitranscriptome: inert-or a platform for functional plasticity?

Georgeson J. & Schwartz S. (2021) RNA (Cambridge). 27, 11, p. 1293-1301

A universal property of all rRNAs explored to date is the prevalence of post-transcriptional ("epitranscriptional") modifications, which expand the chemical and topological properties of the four standard nucleosides. Are these modifications an inert, constitutive part of the ribosome? Or could they, in part, also regulate the structure or function of the ribosome? In this review, we summarize emerging evidence that rRNA modifications are more heterogeneous than previously thought, and that they can also vary from one condition to another, such as in the context of a cellular response or a developmental trajectory. We discuss the implications of these results and key open questions on the path toward connecting such heterogeneity with function.

Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing

Begik O., Lucas M. C., Pryszcz L. P., Ramirez J. M., Medina R., Milenkovic I., Cruciani S., Liu H., Vieira H. G. S., Sas-Chen A., Mattick J. S., Schwartz S. & Novoa E. M. (2021) Nature biotechnology. 39, 10, p. 1278-1291

Nanopore RNA sequencing shows promise as a method for discriminating and identifying different RNA modifications in native RNA. Expanding on the ability of nanopore sequencing to detect N⁶-methyladenosine, we show that other modifications, in particular pseudouridine (Ψ) and 2-O-methylation (Nm), also result in characteristic base-calling error signatures in the nanopore data. Focusing on Ψ modification sites, we detected known and uncovered previously unreported Ψ sites in mRNAs, non-coding RNAs and rRNAs, including a Pus4-dependent Ψ modification in yeast mitochondrial rRNA. To explore the dynamics of pseudouridylation, we treated yeast cells with oxidative, cold and heat stresses and detected heat-sensitive Ψ-modified sites in small nuclear RNAs, small nucleolar RNAs and mRNAs. Finally, we developed a software, nanoRMS, that estimates per-site modification stoichiometries by identifying single-molecule reads with altered current intensity and trace profiles. This work demonstrates that Nm and Ψ RNA modifications can be detected in cellular RNAs and that their modification stoichiometry can be quantified by nanopore sequencing of native RNA.

Multiplexed profiling facilitates robust m6A quantification at site, gene and sample resolution

Dierks D., Garcia-Campos M. A., Uzonyi A., Safra M., Edelheit S., Rossi A., Sideri T., Varier R. A., Brandis A., Stelzer Y., van Werven F., Scherz-Shouval R. & Schwartz S. (2021) Nature Methods. 18, 9, p. 1060-1067

Accepted version

N6-methyladenosine (m6A) is the most prevalent modification of messenger RNA in mammals. To interrogate its functions and dynamics, there is a critical need to quantify m6A at three levels: site, gene and sample. Current approaches address these needs in a limited manner. Here we develop m6A-seq2, relying on multiplexed m6A-immunoprecipitation of barcoded and pooled samples. m6A-seq2 allows a big increase in throughput while reducing technical variability, requirements of input material and cost. m6A-seq2 is furthermore uniquely capable of providing sample-level relative quantitations of m6A, serving as an orthogonal alternative to mass spectrometry-based approaches. Finally, we develop a computational approach for gene-level quantitation of m6A. We demonstrate that using this metric, roughly 30% of the variability in RNA half life in mouse embryonic stem cells can be explained, establishing m6A as a main driver of RNA stability. m6A-seq2 thus provides an experimental and analytic framework for dissecting m6A-mediated regulation at three different levels.

The germinal center reaction depends on RNA methylation and divergent functions of specific methyl readers

Grenov A. C., Moss L., Edelheit S., Cordiner R., Schmiedel D., Biram A., Hanna J. H., Jensen T. H., Schwartz S. & Shulman Z. (2021) Journal of Experimental Medicine. 218, 10, e20210360.

Long-lasting immunity depends on the generation of protective antibodies through the germinal center (GC) reaction. N6-methyladenosine (m6A) modification of mRNAs by METTL3 activity modulates transcript lifetime primarily through the function of m6A readers; however, the physiological role of this molecular machinery in the GC remains unknown. Here, we show that m6A modifications by METTL3 are required for GC maintenance through the differential functions of m6A readers. Mettl3-deficient GC B cells exhibited reduced cell-cycle progression and decreased expression of proliferation- and oxidative phosphorylation-related genes. The m6A binder, IGF2BP3, was required for stabilization of Myc mRNA and expression of its target genes, whereas the m6A reader, YTHDF2, indirectly regulated the expression of the oxidative phosphorylation gene program. Our findings demonstrate how two independent gene networks that support critical GC functions are modulated by m6A through distinct mRNA binders.

Deciphering the principles of the RNA editing code via large-scale systematic probing

Uzonyi A., Nir R., Shliefer O., Stern-Ginossar N., Antebi Y., Stelzer Y., Levanon E. Y. & Schwartz S. (2021) Molecular Cell. 81, 11, p. 2374-2387.e3

Adenosine-to-inosine editing is catalyzed by ADAR1 at thousands of sites transcriptome-wide. Despite intense interest in ADAR1 from physiological, bioengineering, and therapeutic perspectives, the rules of ADAR1 substrate selection are poorly understood. Here, we used large-scale systematic probing of ∼2,000 synthetic constructs to explore the structure and sequence context determining editability. We uncover two structural layers determining the formation and propagation of A-to-I editing, independent of sequence. First, editing is robustly induced at fixed intervals of 35 bp upstream and 30 bp downstream of structural disruptions. Second, editing is symmetrically introduced on opposite sites on a double-stranded structure. Our findings suggest a recursive model for RNA editing, whereby the structural alteration induced by the editing at one site iteratively gives rise to the formation of an additional editing site at a fixed periodicity, serving as a basis for the propagation of editing along and across both strands of double-stranded RNA structures.

How many tRNAs are out there?

Wiener D. & Schwartz S. (2021) Molecular Cell. 81, 8, p. 1595-1597

Quantitative nucleotide resolution profiling of RNA cytidine acetylation by ac4C-seq

Thalalla Gamage S., Sas-Chen A., Schwartz S. & Meier J. L. (2021) Nature protocols. 16, p. 2286-2307

Accepted version

A prerequisite to defining the transcriptome-wide functions of RNA modifications is the ability to accurately determine their location. Here, we present N4-acetylcytidine (ac4C) sequencing (ac4C-seq), a protocol for the quantitative single-nucleotide resolution mapping of cytidine acetylation in RNA. This method exploits the kinetically facile chemical reaction of ac4C with sodium cyanoborohydride under acidic conditions to form a reduced nucleobase. RNA is then fragmented, ligated to an adapter at its 3' end and reverse transcribed to introduce a non-cognate nucleotide at reduced ac4C sites. After adapter ligation, library preparation and high-throughput sequencing, a bioinformatic pipeline enables identification of ac4C positions on the basis of the presence of C→T misincorporations in reduced samples but not in controls. Unlike antibody-based approaches, ac4C-seq identifies specific ac4C residues and reports on their level of modification. The ac4C-seq library preparation protocol can be completed in ~4 d for transcriptome-wide sequencing.

mito-Ψ-Seq: A High-Throughput Method for Systematic Mapping of Pseudouridine Within Mitochondrial RNA

Sas-Chen A., Nir R. & Schwartz S. (2021) Mitochondrial Gene Expression : Methods and Protocols . Rorbach J. & Minczuk M.(eds.). p. 103-115

RNA modifications are present in most cellular RNAs and are formed posttranscriptionally by enzymatic machineries that involve hundreds of enzymes and cofactors. RNA modifications impact the life cycle of the RNA, its stability, folding, cellular localization, as well as interactions with RNA and protein partners. RNA modifications are important for mitochondrial function and are required for proper processing and function of mitochondrial (mt) tRNA and rRNA. Underscoring their importance, several mitochondrial diseases are caused by defects in mt-RNA modifications, stemming from mutations in mtDNA at or near mt-RNA modification sites or in nuclear-encoded mt-RNA modifying enzymes. A highly abundant RNA modification, involved in mitochondrial physiology and pathology is pseudouridylation (Ψ), which is catalyzed by enzymes of the Pseudouridine Synthase (PUS) family. Although some Ψ sites in mt-rRNA and mt-tRNA have been identified, little is known about the functional role of these modifications. Furthermore, it is unknown which enzyme facilitates the modification of each site and it is likely that many yet undiscovered mt-RNA modifications exist, as is evidenced by recent work showing some Ψ sites on mRNA. Here, we present mito-Ψ-Seq, a high-throughput method for semiquantitative mapping of Ψ in mt-RNA.

2020

NOP10 predicts lung cancer prognosis and its associated small nucleolar RNAs drive proliferation and migration

Cui C., Liu Y., Gerloff D., Rohde C., Pauli C., Koehn M., Misiak D., Oellerich T., Schwartz S., Schmidt L., Wiewrodt R., Marra A., Hillejan L., Bartel F., Wickenhauser C., Huettelmaier S., Goellner S., Zhou F., Edemir B. & Mueller-Tidow C. (2020) Oncogene. 40, 5, p. 909-921

Non-small cell lung cancer (NSCLC) is the leading cause of cancer death worldwide underlining the urgent need for new biomarkers and therapeutic targets for this disease. Long noncoding RNAs are critical players in NSCLC but the role of small RNA species is not well understood. In the present study, we investigated the role of H/ACA box small nucleolar RNAs (snoRNAs) and snoRNA-bound ribonucleoproteins (snoRNPs) in the tumorigenesis of NSCLC. H/ACA box snoRNPs including the NOP10 core protein were highly expressed in NSCLC. High levels of either NOP10 mRNA or protein were associated with poor prognosis in NSCLC patients. Loss of NOP10 and subsequent reduction of H/ACA box snoRNAs and rRNA pseudouridylation inhibited lung cancer cell growth, colony formation, migration, and invasion. A focused CRISPR/Cas9 snoRNA knockout screen revealed that genomic deletion of SNORA65, SNORA7A, and SNORA7B reduced proliferation of lung cancer cells. In line, high levels of SNORA65, SNORA7A, and SNORA7B were observed in primary lung cancer specimens with associated changes in rRNA pseudouridylation. Knockdown of either SNORA65 or SNORA7A/B inhibited growth and colony formation of NSCLC cell lines. Our data indicate that specific H/ACA box snoRNAs and snoRNA-associated proteins such as NOP10 have an oncogenic role in NSCLC providing new potential biomarkers and therapeutic targets for the disease.

The epitranscriptome beyond m6A

Wiener D. & Schwartz S. (2020) Nature Reviews Genetics. 2, p. 119-131

Following its transcription, RNA can be modified by >170 chemically distinct types of modifications - the epitranscriptome. In recent years, there have been substantial efforts to uncover and characterize the modifications present on mRNA, motivated by the potential of such modifications to regulate mRNA fate and by discoveries and advances in our understanding of N6-methyladenosine (m6A). Here, we review our knowledge regarding the detection, distribution, abundance, biogenesis, functions and possible mechanisms of action of six of these modifications - pseudouridine (Ψ), 5-methylcytidine (m5C), N1-methyladenosine (m1A), N4-acetylcytidine (ac4C), ribose methylations (Nm) and N7-methylguanosine (m7G). We discuss the technical and analytical aspects that have led to inconsistent conclusions and controversies regarding the abundance and distribution of some of these modifications. We further highlight shared commonalities and important ways in which these modifications differ with respect to m6A, based on which we speculate on their origin and their ability to acquire functions over evolutionary timescales.

Context-dependent functional compensation between Ythdf m6A reader proteins

Lasman L., Krupalnik V., Viukov S., Mor N., Aguilera-Castrejon A., Schneir D., Bayerl J., Mizrahi O., Peles S., Tawil S., Sathe S., Nachshon A., Shani T., Zerbib M., Kilimnik I., Aigner S., Shankar A., Mueller J. R., Schwartz S., Stern-Ginossar N., Yeo G. W., Geula S., Novershtern N. & Hanna J. H. (2020) Genes and Development. 34, 19-20, p. 1373-1391

The N6-methyladenosine (m6A) modification is the most prevalent post-transcriptional mRNA modification, regulating mRNA decay and splicing. It plays a major role during normal development, differentiation, and disease progression. The modification is regulated by a set of writer, eraser, and reader proteins. The YTH domain family of proteins, consists of three homologous m6A-binding proteins, Ythdf1, Ythdf2, and Ythdf3, which were suggested to have different cellular functions. However, their sequence similarity and their tendency to bind the same targets suggest that they may have overlapping roles. We systematically knocked out (KO) the Mettl3 writer, each of the Ythdf readers, and the three readers together (triple-KO). We then estimated the effect in vivo in mouse gametogenesis, postnatal viability, and in vitro in mouse embryonic stem cells (mESCs). In gametogenesis, Mettl3-KO severity is increased as the deletion occurs earlier in the process, and Ythdf2 has a dominant role that cannot be compensated by Ythdf1 or Ythdf3, due to differences in readers' expression pattern across different cell types, both in quantity and in spatial location. Knocking out the three readers together and systematically testing viable offspring genotypes revealed a redundancy in the readers' role during early development that is Ythdf1/2/3 gene dosage-dependent. Finally, in mESCs there is compensation between the three Ythdf reader proteins, since the resistance to differentiate and the significant effect on mRNA decay occur only in the triple-KO cells and not in the single KOs. Thus, we suggest a new model for the Ythdf readers function, in which there is profound dosage-dependent redundancy when all three readers are equivalently coexpressed in the same cell types.

Dynamic RNA acetylation revealed by quantitative cross-evolutionary mapping

Sas-Chen A., Thomas J. M., Matzov D., Taoka M., Nance K. D., Nir R., Bryson K. M., Shachar R., Liman G. L. S., Burkhart B. W., Gamage S. T., Nobe Y., Briney C. A., Levy M. J., Fuchs R. T., Robb G. B., Hartmann J., Sharma S., Lin Q., Florens L., Washburn M. P., Isobe T., Santangelo T. J., Shalev-Benami M., Meier J. L. & Schwartz S. (2020) Nature. 583, 7817, p. 638-643

Accepted version

N-4-acetylcytidine (ac(4)C) is an ancient and highly conserved RNA modification that is present on tRNA and rRNA and has recently been investigated in eukaryotic mRNA(1-3). However, the distribution, dynamics and functions of cytidine acetylation have yet to be fully elucidated. Here we report ac(4)C-seq, a chemical genomic method for the transcriptome-wide quantitative mapping of ac(4)C at single-nucleotide resolution. In human and yeast mRNAs, ac(4)C sites are not detected but can be induced-at a conserved sequence motif-via the ectopic overexpression of eukaryotic acetyltransferase complexes. By contrast, cross-evolutionary profiling revealed unprecedented levels of ac(4)C across hundreds of residues in rRNA, tRNA, non-coding RNA and mRNA from hyperthermophilic archaea. (AcC)-C-4 is markedly induced in response to increases in temperature, and acetyltransferase-deficient archaeal strains exhibit temperature-dependent growth defects. Visualization of wild-type and acetyltransferase-deficient archaeal ribosomes by cryo-electron microscopy provided structural insights into the temperature-dependent distribution of ac(4)C and its potential thermoadaptive role. Our studies quantitatively define the ac(4)C landscape, providing a technical and conceptual foundation for elucidating the role of this modification in biology and disease(4-6).

2019

Accurate detection of m6A RNA modifications in native RNA sequences

Liu H., Begik O., Lucas M. C., Ramirez J. M., Mason C. E., Wiener D., Schwartz S., Mattick J. S., Smith M. A. & Novoa E. M. (2019) Nature Communications. 10, 1, 4079.

The epitranscriptomics field has undergone an enormous expansion in the last few years; however, a major limitation is the lack of generic methods to map RNA modifications transcriptome-wide. Here, we show that using direct RNA sequencing, N6-methyladenosine (m6A) RNA modifications can be detected with high accuracy, in the form of systematic errors and decreased base-calling qualities. Specifically, we find that our algorithm, trained with m6A-modified and unmodified synthetic sequences, can predict m6A RNA modifications with ~90% accuracy. We then extend our findings to yeast data sets, finding that our method can identify m6A RNA modifications in vivo with an accuracy of 87%. Moreover, we further validate our method by showing that these 'errors' are typically not observed in yeast ime4-knockout strains, which lack m6A modifications. Our results open avenues to investigate the biological roles of RNA modifications in their native RNA context.

Germline NPM1 mutations lead to altered rRNA 2 '-O-methylation and cause dyskeratosis congenita

Nachmani D., Bothmer A. H., Grisendi S., Mele A., Bothmer D., Lee J. D., Monteleone E., Cheng K., Zhang Y., Bester A. C., Guzzetti A., Mitchell C. A., Mendez L. M., Pozdnyakova O., Sportoletti P., Martelli M., Vulliamy T. J., Safra M., Schwartz S., Luzzatto L., Bluteau O., Soulier J., Darnell R. B., Falini B., Dokal I., Ito K., Clohessy J. G. & Pandolfi P. P. (2019) Nature Genetics. 51, 10, p. 1518-1529

RNA modifications are emerging as key determinants of gene expression. However, compelling genetic demonstrations of their relevance to human disease are lacking. Here, we link ribosomal RNA 2'-O-methylation (2'-O-Me) to the etiology of dyskeratosis congenita. We identify nucleophosmin (NPM1) as an essential regulator of 2'-O-Me on rRNA by directly binding C/D box small nucleolar RNAs, thereby modulating translation. We demonstrate the importance of 2'-O-Me-regulated translation for cellular growth, differentiation and hematopoietic stem cell maintenance, and show that Npm1 inactivation in adult hematopoietic stem cells results in bone marrow failure. We identify NPM1 germline mutations in patients with dyskeratosis congenita presenting with bone marrow failure and demonstrate that they are deficient in small nucleolar RNA binding. Mice harboring a dyskeratosis congenita germline Npm1 mutation recapitulate both hematological and nonhematological features of dyskeratosis congenita. Thus, our findings indicate that impaired 2'-O-Me can be etiological to human disease.

Deciphering the "m6A Code" via Antibody-Independent Quantitative Profiling

Garcia-Campos M. A., Edelheit S., Toth U., Safra M., Shachar R., Viukov S., Winkler R., Nir R., Lasman L., Brandis A., Hanna J. H., Rossmanith W. & Schwartz S. (2019) Cell. 178, 3, p. 731-747

N6-methyladenosine (m6A) is the most abundant modification on mRNA and is implicated in critical roles in development, physiology, and disease. A major limitation has been the inability to quantify m6A stoichiometry and the lack of antibody-independent methodologies for interrogating m6A. Here, we develop MAZTER-seq for systematic quantitative profiling of m6A at single-nucleotide resolution at 16%-25% of expressed sites, building on differential cleavage by an RNase. MAZTER-seq permits validation and de novo discovery of m6A sites, calibration of the performance of antibody-based approaches, and quantitative tracking of m6A dynamics in yeast gametogenesis and mammalian differentiation. We discover that m6A stoichiometry is "hard coded" in cis via a simple and predictable code, accounting for 33%-46% of the variability in methylation levels and allowing accurate prediction of m6A loss and acquisition events across evolution. MAZTER-seq allows quantitative investigation of m6A regulation in subcellular fractions, diverse cell types, and disease states.

Misincorporation signatures for detecting modifications in mRNA: Not as simple as it sounds

Sas-Chen A. & Schwartz S. (2019) Methods. 156, p. 53-59

Post-transcriptional modification on mRNA has become a field of intense interest in recent years, and next generation sequencing based technologies are constantly emerging to detect an increasing number of modifications at a transcriptome-wide level. Some of these approaches are based on identification of misincorporation events induced by reverse transcriptase at modified sites. Although conceptually trivial, sensitive and specific identification of such events is a challenge prone to a surprising number of artifacts, which can lead to substantially inflated estimates of the abundance of diverse modifications. Here we discuss the sources of some of these artifacts and delineate approaches to overcome them.

m⁶A modification controls the innate immune response to infection by targeting type I interferons

Winkler R., Gillis E., Lasman L., Safra M., Geula S., Soyris C., Nachshon A., Tai-Schmiedel J., Friedman N., Le-Trilling V. T. K., Trilling M., Mandelboim M., Hanna J. H., Schwartz S. & Stern-Ginossar N. (2019) Nature Immunology. 20, 2, p. 173-182

Accepted version

N⁶-methyladenosine (m⁶A) is the most common mRNA modification. Recent studies have revealed that depletion of m⁶A machinery leads to alterations in the propagation of diverse viruses. These effects were proposed to be mediated through dysregulated methylation of viral RNA. Here we show that following viral infection or stimulation of cells with an inactivated virus, deletion of the m⁶A writer METTL3 or reader YTHDF2 led to an increase in the induction of interferon-stimulated genes. Consequently, propagation of different viruses was suppressed in an interferon-signaling-dependent manner. Significantly, the mRNA of IFNB, the gene encoding the main cytokine that drives the type I interferon response, was m⁶A modified and was stabilized following repression of METTL3 or YTHDF2. Furthermore, we show that m⁶A-mediated regulation of interferon cxgenes was conserved in mice. Together, our findings uncover the role m⁶A serves as a negative regulator of interferon response by dictating the fast turnover of interferon mRNAs and consequently facilitating viral propagation.

2018

Variants in PUS7 Cause Intellectual Disability with Speech Delay, Microcephaly, Short Stature, and Aggressive Behavior

De Brouwer A. P. M., Abou Jamra R., Koertel N., Soyris C., Polla D. L., Safra M., Zisso A., Powell C. A., Rebelo-Guiomar P., Dinges N., Morin V., Stock M., Hussain M., Shahzad M., Riazuddin S., Ahmed Z. M., Pfundt R., Schwarz F., de Boer L., Reis A., Grozeva D., Raymond F. L., Riazuddin S., Koolen D. A., Minczuk M., Roignant J., van Bokhoven H. & Schwartz S. (2018) American Journal of Human Genetics. 103, 6, p. 1045-1052

Accepted version

We describe six persons from three families with three homozygous protein truncating variants in PUS7: c.89_90del (p.Thr30Lysfs*20), c.1348C>T (p.Arg450*), and a deletion of the penultimate exon 15. All these individuals have intellectual disability with speech delay, short stature, microcephaly, and aggressive behavior. PUS7 encodes the RNA-independent pseudouridylate synthase 7. Pseudouridylation is the most abundant post-transcriptional modification in RNA, which is primarily thought to stabilize secondary structures of RNA. We show that the disease-related variants lead to abolishment of PUS7 activity on both tRNA and mRNA substrates. Moreover, pus7 knockout in Drosophila melanogaster results in a number of behavioral defects, including increased activity, disorientation, and aggressiveness supporting that neurological defects are caused by PUS7 variants. Our findings demonstrate that RNA pseudouridylation by PUS7 is essential for proper neuronal development and function.

m(1)A within cytoplasmic mRNAs at single nucleotide resolution: a reconciled transcriptome-wide map

Schwartz S. (2018) Rna-A Publication Of The Rna Society. 24, 11, p. 1427-1436

Following synthesis, RNA can be modified with over 100 chemically distinct modifications. Recently, two studies-one by our group-developed conceptually similar approaches to globally map N1-methyladenosine (m(1)A) at single nucleotide resolution. Surprisingly, the studies diverged quite substantially in their estimates of the abundance, whereabouts, and stoichiometry of m(1)A within internal sites in cytosolic mRNAs: One study reported it to be a very rare modification, present at very low stoichiometries, and invariably catalyzed by TRMT6/61A. The other found it to be present at >470 sites, often at high levels, and suggested that the vast majority were highly unlikely to be TRMT6/61A substrates. Here we reanalyze the data from the latter study, and demonstrate that the vast majority of the detected sites originate from duplications, mis-annotations, mismapping, SNPs, sequencing errors, and a set of sites from the very first transcribed base that appear to originate from nontemplated incorporations by reverse transcriptase. Only 53 of the sites detected in the latter study likely reflect bona-fide internal modifications of cytoplasmically encoded mRNA molecules, nearly all of which are likely TRMT6/TRMT61A substrates and typically modified at low to undetectable levels. The experimental data sets from both studies thus consistently demonstrate that within cytosolic mRNAs, m(1)A is a rare internal modification where it is typically catalyzed at very low stoichiometries via a single complex. Our findings offer a clear and consistent view on the abundance and whereabouts of m(1)A, and lay out directions for future studies.

Positioning Europe for the EPITRANSCRIPTOMICS challenge

Jantsch M., Quattrone A., O'Connell M., Helm M., Frye M., Macias-Gonzales M., Ohman M., Ameres S., Willems L., Fuks F., Oulas A., Vanacova S., Nielsen H., Bousquet-Antonelli C., Motorin Y., Roignant J., Balatsos N., Dinnyes A., Baranov P., Kelly V., Lamm A., Rechavi G., Pelizzola M., Liepins J., Holodnuka Kholodnyuk I., Zammit V., Ayers D., Drablos F., Dahl J. A., Bujnicki J., Jeronimo C., Almeida R., Neagu M., Costache M., Bankovic J., Banovic B., Kyselovic J., Valor L. M., Selbert S., Pir P., Demircan T., Cowling V., Schäfer M., Rossmanith W., Lafontaine D., David A., Carre C., Lyko F., Schaffrath R. & Schwartz S. (2018) RNA Biology. 15, 6, p. 829-831

The genetic alphabet consists of the four letters: C, A, G, and T in DNA and C,A,G, and U in RNA. Triplets of these four letters jointly encode 20 different amino acids out of which proteins of all organisms are built. This system is universal and is found in all kingdoms of life. However, bases in DNA and RNA can be chemically modified. In DNA, around 10 different modifications are known, and those have been studied intensively over the past 20 years. Scientific studies on DNA modifications and proteins that recognize them gave rise to the large field of epigenetic and epigenomic research. The outcome of this intense research field is the discovery that development, ageing, and stem-cell dependent regeneration but also several diseases including cancer are largely controlled by the epigenetic state of cells. Consequently, this research has already led to the first FDA approved drugs that exploit the gained knowledge to combat disease. In recent years, the ~150 modifications found in RNA have come to the focus of intense research. Here we provide a perspective on necessary and expected developments in the fast expanding area of RNA modifications, termed epitranscriptomics.

2017

The m(1)A landscape on cytosolic and mitochondrial mRNA at single-base resolution

Safra M., Sas-Chen A., Nir R., Winkler R., Nachshon A., Bar-Yaacov D., Erlacher M., Rossmanith W., Stern-Ginossar N. & Schwartz S. (2017) Nature. 551, 7679, p. 251-255

Accepted version

Modifications on mRNA offer the potential of regulating mRNA fate post-transcriptionally. Recent studies suggested the widespread presence of N-1-methyladenosine (m(1)A), which disrupts Watson-Crick base pairing, at internal sites of mRNAs(1,2). These studies lacked the resolution of identifying individual modified bases, and did not identify specific sequence motifs undergoing the modification or an enzymatic machinery catalysing them, rendering it challenging to validate and functionally characterize putative sites. Here we develop an approach that allows the transcriptome-wide mapping of m(1)A at single-nucleotide resolution. Within the cytosol, m(1)A is present in a low number of mRNAs, typically at low stoichiometries, and almost invariably in tRNA T-loop-like structures, where it is introduced by the TRMT6/TRMT61A complex. We identify a single m(1)A site in the mitochondrial ND5 mRNA, catalysed by TRMT10C, with methylation levels that are highly tissue specific and tightly developmentally controlled. m1A leads to translational repression, probably through a mechanism involving ribosomal scanning or translation. Our findings suggest that m(1)A on mRNA, probably because of its disruptive impact on base pairing, leads to translational repression, and is generally avoided by cells, while revealing one case in mitochondria where tight spatiotemporal control over m(1)A levels was adopted as a potential means of post-transcriptional regulation.

RNA editing in bacteria recodes multiple proteins and regulates an evolutionarily conserved toxin-antitoxin system

Bar-Yaacov D., Mordret E., Towers R., Biniashvili T., Soyris C., Schwartz S., Dahan O. & Pilpel Y. (2017) Genome Research. 27, 10, p. 1696-1703

Adenosine (A) to inosine (I) RNA editing is widespread in eukaryotes. In prokaryotes, however, A-to-I RNA editing was only reported to occur in tRNAs but not in protein-coding genes. By comparing DNA and RNA sequences of Escherichia coli, we show for the first time that A-to-I editing occurs also in prokaryotic mRNAs and has the potential to affect the translated proteins and cell physiology. We found 15 novel A-to-I editing events, of which 12 occurred within known protein-coding genes where they always recode a tyrosine (TAC) into a cysteine (TGC) codon. Furthermore, we identified the tRNA-specific adenosine deaminase (tadA) as the editing enzyme of all these editing sites, thus making it the first identified RNA editing enzyme that modifies both tRNAs and mRNAs. Interestingly, several of the editing targets are self-killing toxins that belong to evolutionarily conserved toxin-antitoxin pairs. We focused on hokB, a toxin that confers antibiotic tolerance by growth inhibition, as it demonstrated the highest level of such mRNA editing. We identified a correlated mutation pattern between the edited and a DNA hard-coded Cys residue positions in the toxin and demonstrated that RNA editing occurs in hokB in two additional bacterial species. Thus, not only the toxin is evolutionarily conserved but also the editing itself within the toxin is. Finally, we found that RNA editing in hokB increases as a function of cell density and enhances its toxicity. Our work thus demonstrates the occurrence, regulation, and functional consequences of RNA editing in bacteria.

Corrigendum: TRUB1 is the predominant pseudouridine synthase acting on mammalian mRNA via a predictable and conserved code

Safra M., Nir R., Farouq D., Vainberg Slutskin I. & Schwartz S. (2017) Genome Research. 27, 8, p. 1460

AML1-ETO requires enhanced C/D box snoRNA/RNP formation to induce self-renewal and leukaemia

Zhou F., Liu Y., Rohde C., Pauli C., Gerloff D., Koehn M., Misiak D., Baeumer N., Cui C., Goellner S., Oellerich T., Serve H., Garcia-Cuellar M., Slany R., Maciejewski J. P., Przychodzen B., Seliger B., Klein H., Bartenhagen C., Berdel W. E., Dugas M., Taketo M. M., Farouq D., Schwartz S., Regev A., Hebert J., Sauvageau G., Pabst C., Huettelmaier S. & Mueller-Tidow C. (2017) Nature Cell Biology. 19, 7, p. 844-855

Leukaemogenesis requires enhanced self-renewal, which is induced by oncogenes. The underlying molecular mechanisms remain incompletely understood. Here, we identified C/D box snoRNAs and rRNA 2-O-methylation as critical determinants of leukaemic stem cell activity. Leukaemogenesis by AML1-ETO required expression of the groucho-related amino-terminal enhancer of split (AES). AES functioned by inducing snoRNA/RNP formation via interaction with the RNA helicase DDX21. Similarly, global loss of C/D box snoRNAs with concomitant loss of rRNA 2-O-methylation resulted in decreased leukaemia self-renewal potential. Genomic deletion of either C/D box snoRNA SNORD14D or SNORD35A suppressed clonogenic potential of leukaemia cells in vitro and delayed leukaemogenesis in vivo. We further showed that AML1-ETO9a, MYC and MLL-AF9 all enhanced snoRNA formation. Expression levels of C/D box snoRNAs in AML patients correlated closely with in vivo frequency of leukaemic stem cells. Collectively, these findings indicate that induction of C/D box snoRNA/RNP function constitutes an important pathway in leukaemogenesis.

TRUB1 is the predominant pseudouridine synthase acting on mammalian mRNA via a predictable and conserved code

Safra M., Nir R., Farouq D., Slutzkin I. V. & Schwartz S. (2017) Genome Research. 27, 3, p. 393-406

Following synthesis, RNA can be modified with over 100 chemically distinct modifications, which can potentially regulate RNA expression post-transcriptionally. Pseudouridine (psi) was recently established to be widespread and dynamically regulated on yeast mRNA, but less is known about psi presence, regulation, and biogenesis in mammalian mRNA. Here, we sought to characterize the psi landscape on mammalian mRNA, to identify the main psi-synthases (PUSs) catalyzing psi formation, and to understand the factors governing their specificity toward selected targets. We first developed a framework allowing analysis, evaluation, and integration of. mappings, which we applied to >2.5 billion reads from 30 human samples. These maps, complemented with genetic perturbations, allowed us to uncover TRUB1 and PUS7 as the two key PUSs acting on mammalian mRNA and to computationally model the sequence and structural elements governing the specificity of TRUB1, achieving near-perfect prediction of its substrates (AUC = 0.974). We then validated and extended these maps and the inferred specificity of TRUB1 using massively parallel reporter assays in which we monitored.levels at thousands of synthetically designed sequence variants comprising either the sequences surrounding pseudouridylation targets or systematically designed mutants perturbing RNA sequence and structure. Our findings provide an extensive and high-quality characterization of the transcriptome-wide distribution of pseudouridine in human and the factors governing it and provide an important resource for the community, paving the path toward functional and mechanistic dissection of this emerging layer of post-transcriptional regulation.

Next-generation sequencing technologies for detection of modified nucleotides in RNAs

Schwartz S. & Motorin Y. (2017) RNA Biology. 14, 9, p. 1124-1137

Our ability to map and quantify RNA modifications at a genome-wide scale have revolutionized our understanding of the pervasiveness and dynamic regulation of diverse RNA modifications. Recent efforts in the field have demonstrated the presence of modified residues in almost any type of cellular RNA. Next-generation sequencing (NGS) technologies are the primary choice for transcriptome-wide RNA modification mapping. Here we provide an overview of approaches for RNA modification detection based on their RT-signature, specific chemicals, antibody-dependent (Ab) enrichment, or combinations thereof. We further discuss sources of artifacts in genome-wide modification maps, and experimental and computational considerations to overcome them. The future in this field is tightly linked to the development of new specific chemical reagents, highly specific Ab against RNA modifications and use of single-molecule RNA sequencing techniques.

2016

A network-based analysis of colon cancer Splicing changes reveals a tumorigenesis-favoring regulatory pathway emanating from ELK1

Hollander D., Donyo M., Atias N., Mekahel K., Melamed Z., Yannai S., Lev-Maor G., Shilo A., Schwartz S., Barshack I., Sharan R. & Ast G. (2016) Genome Research. 26, 4, p. 541-553

Splicing aberrations are prominent drivers of cancer, yet the regulatory pathways controlling them are mostly unknown. Here we develop a method that integrates physical interaction, gene expression, and alternative splicing data to construct the largest map of transcriptomic and proteomic interactions leading to cancerous splicing aberrations defined to date, and identify driver pathways therein. We apply our method to colon adenocarcinoma and non-small-cell lung carcinoma. By focusing on colon cancer, we reveal a novel tumor-favoring regulatory pathway involving the induction of the transcription factor MYC by the transcription factor ELK1, as well as the subsequent induction of the alternative splicing factor PTBP1 by both. We show that PTBP1 promotes specific RAC1, NUMB, and PKM splicing isoforms that are major triggers of colon tumorigenesis. By testing the pathway's activity in patient tumor samples, we find ELK1, MYC, and PTBP1 to be overexpressed in conjunction with oncogenic KRAS mutations, and show that these mutations increase ELK1 levels via the RAS-MAPK pathway. We thus illuminate, for the first time, a full regulatory pathway connecting prevalent cancerous mutations to functional tumorinducing splicing aberrations. Our results demonstrate our method is applicable to different cancers to reveal regulatory pathways promoting splicing aberrations.

Cracking the epitranscriptome

Schwartz S. (2016) RNA. 22, 2, p. 169-174

Over 100 distinct chemical modifications can be catalyzed on RNA post-synthesis, potentially serving as a post-transcriptional regulatory layer of gene expression. This review focuses on recent advances, knowledge gaps, and challenges pertaining to N6- methyladenosine (m6A), an abundant modification of mRNA for which substantial progress has been made in recent years. The discussed aspects are also very relevant for a wide range of additional modifications on mRNA collectively coined the epitranscriptome.

2015

Quantitative visualization of alternative exon expression from RNA-seq data

Katz Y., Wang E. T., Silterra J., Schwartz S., Wong B., Thorvaldsdottir H., Robinson J. T., Mesirov J. P., Airoldi E. M. & Burge C. B. (2015) Bioinformatics. 31, 14, p. 2400-2402

Motivation: Analysis of RNA sequencing (RNA-Seq) data revealed that the vast majority of human genes express multiple mRNA isoforms, produced by alternative pre-mRNA splicing and other mechanisms, and that most alternative isoforms vary in expression between human tissues. As RNA-Seq datasets grow in size, it remains challenging to visualize isoform expression across multiple samples. Results: To help address this problem, we present Sashimi plots, a quantitative visualization of aligned RNA-Seq reads that enables quantitative comparison of exon usage across samples or experimental conditions. Sashimi plots can be made using the Broad Integrated Genome Viewer or with a stand-alone command line program.

Dynamic profiling of the protein life cycle in response to pathogens

Jovanovic M., Rooney M. S., Mertins P., Przybylski D., Chevrier N., Satija R., Rodriguez E. H., Fields A. P., Schwartz S., Raychowdhury R., Mumbach M. R., Eisenhaure T., Rabani M., Gennert D., Lu D., Delorey T., Weissman J. S., Carr S. A., Hacohen N. & Regev A. (2015) Science. 347, 6226, 1259038.

Protein expression is regulated by the production and degradation of messenger RNAs (mRNAs) and proteins, but their specific relationships remain unknown. We combine measurements of protein production and degradation and mRNA dynamics so as to build a quantitative genomic model of the differential regulation of gene expression in lipopolysaccharide-stimulated mouse dendritic cells. Changes in mRNA abundance play a dominant role in determining most dynamic fold changes in protein levels. Conversely, the preexisting proteome of proteins performing basic cellular functions is remodeled primarily through changes in protein production or degradation, accounting for more than half of the absolute change in protein molecules in the cell. Thus, the proteome is regulated by transcriptional induction for newly activated cellular functions and by protein life-cycle changes for remodeling of preexisting functions.

2014

Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA

Schwartz S., Bernstein D. A., Mumbach M. R., Jovanovic M., Herbst R. H., Leon-Ricardo B. X., Engreitz J. M., Guttman M., Satija R., Lander E. S., Fink G. & Regev A. (2014) Cell. 159, 1, p. 148-162

Pseudouridine is the most abundant RNA modification, yet except for a few well-studied cases, little is known about the modified positions and their function(s). Here, we develop Ψ-seq for transcriptome-wide quantitative mapping of pseudouridine. We validate Ψ-seq with spike-ins and de novo identification of previously reported positions and discover hundreds of unique sites in human and yeast mRNAs and snoRNAs. Perturbing pseudouridine synthases (PUS) uncovers which pseudouridine synthase modifies each site and their target sequence features. mRNA pseudouridinylation depends on both site-specific and snoRNA-guided pseudouridine synthases. Upon heat shock in yeast, Pus7p-mediated pseudouridylation is induced at >200 sites, and PUS7 deletion decreases the levels of otherwise pseudouridylated mRNA, suggesting a role in enhancing transcript stability. rRNA pseudouridine stoichiometries are conserved but reduced in cells from dyskeratosis congenita patients, where the PUS DKC1 is mutated. Our work identifies an enhanced, transcriptome-wide scope for pseudouridine and methods to dissect its underlying mechanisms and function.

Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5' sites

Schwartz S., Mumbach M. R., Jovanovic M., Wang T., Maciag K., Bushkin G. G., Mertins P., Ter-Ovanesyan D., Habib N., Cacchiarelli D., Sanjana N. E., Freinkman E., Pacold M. E., Satija R., Mikkelsen T. S., Hacohen N., Zhang F., Carr S. A., Lander E. S. & Regev A. (2014) Cell Reports. 8, 1, p. 284-296

N6-methyladenosine (m6A) is a common modification of mRNA with potential roles in fine-tuning the RNA life cycle. Here, we identify a dense network of proteins interacting with METTL3, a component of the methyltransferase complex, and show that three of them (WTAP, METTL14, and KIAA1429) are required for methylation. Monitoring m6A levels upon WTAP depletion allowed the definition of accurate and near single-nucleotide resolution methylation maps and their classification into WTAP-dependent and -independent sites. WTAP-dependent sites are located at internal positions in transcripts, topologically static across a variety of systems we surveyed, and inversely correlated with mRNA stability, consistent with a role in establishing "basal" degradation rates. WTAP-independent sites form at the first transcribed base as part of the cap structure and are present at thousands of sites, forming a previously unappreciated layer of transcriptome complexity. Our data shed light on the proteomic and transcriptional underpinnings of this RNA modification.

Single-cell RNA-seq reveals dynamic paracrine control of cellular variation

Shalek A. K., Satija R., Shuga J., Trombetta J. J., Gennert D., Lu D., Chen P., Gertner R. S., Gaublomme J. T., Yosef N., Schwartz S., Fowler B., Weaver S., Wang J., Wang X., Ding R., Raychowdhury R., Friedman N., Hacohen N., Park H., May A. P. & Regev A. (2014) Nature. 510, 7505, p. 363-369

Accepted version

High-throughput single-cell transcriptomics offers an unbiased approach for understanding the extent, basis and function of gene expression variation between seemingly identical cells. Here we sequence single-cell RNA-seq libraries prepared from over 1,700 primary mouse bone-marrow-derived dendritic cells spanning several experimental conditions. We find substantial variation between identically stimulated dendritic cells, in both the fraction of cells detectably expressing a given messenger RNA and the transcript's level within expressing cells. Distinct gene modules are characterized by different temporal heterogeneity profiles. In particular, a 'core'module of antiviral genes is expressed very early by a few 'precocious'cells in response to uniform stimulation with a pathogenic component, but is later activated in all cells. By stimulating cells individually in sealed microfluidic chambers, analysing dendritic cells from knockout mice, and modulating secretion and extracellular signalling, we show that this response is coordinated by interferon-mediated paracrine signalling from these precocious cells. Notably, preventing cell-to-cell communication also substantially reduces variability between cells in the expression of an early-induced 'peaked'inflammatory module, suggesting that paracrine signalling additionally represses part of the inflammatory program. Our study highlights the importance of cell-to-cell communication in controlling cellular heterogeneity and reveals general strategies that multicellular populations can use to establish complex dynamic responses.

2013

High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis

Schwartz S., Agarwala S. D., Mumbach M. R., Jovanovic M., Mertins P., Shishkin A., Tabach Y., Mikkelsen T. S., Satija R., Ruvkun G., Carr S. A., Lander E. S., Fink G. R. & Regev A. (2013) Cell. 155, 6, p. X1409-1421

N⁶-methyladenosine (m⁶A) is the most ubiquitous mRNA base modification, but little is known about its precise location, temporal dynamics, and regulation. Here, we generated genomic maps of m⁶A sites in meiotic yeast transcripts at nearly single-nucleotide resolution, identifying 1,308 putatively methylated sites within 1,183 transcripts. We validated eight out of eight methylation sites in different genes with direct genetic analysis, demonstrated that methylated sites are significantly conserved in a related species, and built a model that predicts methylated sites directly from sequence. Sites vary in their methylation profiles along a dense meiotic time course and are regulated both locally, via predictable methylatability of each site, and globally, through the core meiotic circuitry. The methyltransferase complex components localize to the yeast nucleolus, and this localization is essential for mRNA methylation. Our data illuminate a conserved, dynamically regulated methylation program in yeast meiosis and provide an important resource for studying the function of this epitranscriptomic modification.

Transcriptome-Wide Mapping of 5-methylcytidine RNA Modifications in Bacteria, Archaea, and Yeast Reveals m⁵C within Archaeal mRNAs

Edelheit S., Schwartz S., Mumbach M. R., Wurtzel O. & Sorek R. (2013) PLoS Genetics. 9, 6, e1003602.

The presence of 5-methylcytidine (m⁵C) in tRNA and rRNA molecules of a wide variety of organisms was first observed more than 40 years ago. However, detection of this modification was limited to specific, abundant, RNA species, due to the usage of low-throughput methods. To obtain a high resolution, systematic, and comprehensive transcriptome-wide overview of m⁵C across the three domains of life, we used bisulfite treatment on total RNA from both gram positive (B. subtilis) and gram negative (E. coli) bacteria, an archaeon (S. solfataricus) and a eukaryote (S. cerevisiae), followed by massively parallel sequencing. We were able to recover most previously documented m⁵C sites on rRNA in the four organisms, and identified several novel sites in yeast and archaeal rRNAs. Our analyses also allowed quantification of methylated m⁵C positions in 64 tRNAs in yeast and archaea, revealing stoichiometric differences between the methylation patterns of these organisms. Molecules of tRNAs in which m⁵C was absent were also discovered. Intriguingly, we detected m⁵C sites within archaeal mRNAs, and identified a consensus motif of AUCGANGU that directs methylation in S. solfataricus. Our results, which were validated using m⁵C-specific RNA immunoprecipitation, provide the first evidence for mRNA modifications in archaea, suggesting that this mode of post-transcriptional regulation extends beyond the eukaryotic domain.

Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells

Shalek A. K., Satija R., Adiconis X., Gertner R. S., Gaublomme J. T., Raychowdhury R., Schwartz S., Yosef N., Malboeuf C., Lu D., Trombetta J. J., Gennert D., Gnirke A., Goren A., Hacohen N., Levin J. Z., Park H. & Regev A. (2013) Nature. 498, 7453, p. 236-240

Accepted version

Recent molecular studies have shown that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels and phenotypic output(1-5), with important functional consequences(4,5). Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs1,2 or proteins(5,6) simultaneously, because genomic profiling methods(3) could not be applied to single cells until very recently(7-10). Here we use single-cell RNA sequencing to investigate heterogeneity in the response of mouse bone-marrow-derived dendritic cells (BMDCs) to lipopolysaccharide. We find extensive, and previously unobserved, bimodal variation in messenger RNA abundance and splicing patterns, which we validate by RNA-fluorescence in situ hybridization for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, we identify a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, we show that variability in this module may be propagated through an interferon feedback circuit, involving the transcriptional regulators Stat2 and Irf7. Our study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.

2012

Topology of the human and mouse m⁶A RNA methylomes revealed by m⁶A-seq

Dominissini D., Moshitch-Moshkovitz S., Schwartz S., Salmon-Divon M., Ungar L., Osenberg S., Cesarkas K., Jacob-Hirsch J., Amariglio N., Kupiec M., Sorek R. & Rechavi G. (2012) Nature. 485, 7397, p. 201-206

An extensive repertoire of modifications is known to underlie the versatile coding, structural and catalytic functions of RNA, but it remains largely uncharted territory. Although biochemical studies indicate that N⁶-methyladenosine (m⁶A) is the most prevalent internal modification in messenger RNA, an in-depth study of its distribution and functions has been impeded by a lack of robust analytical methods. Here we present the human and mouse m⁶A modification landscape in a transcriptome-wide manner, using a novel approach, m⁶A-seq, based on antibody-mediated capture and massively parallel sequencing. We identify over 12, 000 m⁶A sites characterized by a typical consensus in the transcripts of more than 7, 000 human genes. Sites preferentially appear in two distinct landmarks-around stop codons and within long internal exons-and are highly conserved between human and mouse. Although most sites are well preserved across normal and cancerous tissues and in response to various stimuli, a subset of stimulus-dependent, dynamically modulated sites is identified. Silencing the m⁶A methyltransferase significantly affects gene expression and alternative splicing patterns, resulting in modulation of the p53 (also known as TP53) signalling pathway and apoptosis. Our findings therefore suggest that RNA decoration by m⁶A has a fundamental role in regulation of gene expression.

Differential GC Content between Exons and Introns Establishes Distinct Strategies of Splice-Site Recognition

Amit M., Donyo M., Hollander D., Goren A., Kim E., Gelfman S., Lev-Maor G., Burstein D., Schwartz S., Postolsky B., Pupko T. & Ast G. (2012) Cell Reports. 1, 5, p. 543-556

During evolution segments of homeothermic genomes underwent a GC content increase. Our analyses reveal that two exon-intron architectures have evolved from an ancestral state of low GC content exons flanked by short introns with a lower GC content. One group underwent a GC content elevation that abolished the differential exon-intron GC content, with introns remaining short. The other group retained the overall low GC content as well as the differential exon-intron GC content, and is associated with longer introns. We show that differential exon-intron GC content regulates exon inclusion level in this group, in which disease-associated mutations often lead to exon skipping. This group@s exons also display higher nucleosome occupancy compared to flanking introns and exons of the other group, thus " marking" them for spliceosomal recognition. Collectively, our results reveal that differential exon-intron GC content is a previously unidentified determinant of exon selection and argue that the two GC content architectures reflect the two mechanisms by which splicing signals are recognized: exon definition and intron definition.

Transcriptome-wide discovery of circular RNAs in Archaea

Danan M., Schwartz S., Edelheit S. & Sorek R. (2012) Nucleic Acids Research. 40, 7, p. 3131-3142

Circular RNA forms had been described in all domains of life. Such RNAs were shown to have diverse biological functions, including roles in the life cycle of viral and viroid genomes, and in maturation of permuted tRNA genes. Despite their potentially important biological roles, discovery of circular RNAs has so far been mostly serendipitous. We have developed circRNA-seq, a combined experimental/computational approach that enriches for circular RNAs and allows profiling their prevalence in a whole-genome, unbiased manner. Application of this approach to the archaeon Sulfolobus solfataricus P2 revealed multiple circular transcripts, a subset of which was further validated independently. The identified circular RNAs included expected forms, such as excised tRNA introns and rRNA processing intermediates, but were also enriched with non-coding RNAs, including C/D box RNAs and RNase P, as well as circular RNAs of unknown function. Many of the identified circles were conserved in Sulfolobus acidocaldarius, further supporting their functional significance. Our results suggest that circular RNAs, and particularly circular non-coding RNAs, are more prevalent in archaea than previously recognized, and might have yet unidentified biological roles. Our study establishes a specific and sensitive approach for identification of circular RNAs using RNA-seq, and can readily be applied to other organisms.

Changes in exon-intron structure during vertebrate evolution affect the splicing pattern of exons

Gelfman S., Burstein D., Penn O., Savchenko A., Amit M., Schwartz S., Pupko T. & Ast G. (2012) Genome Research. 22, 1, p. 35-50

Exon-intron architecture is one of the major features directing the splicing machinery to the short exons that are located within long flanking introns. However, the evolutionary dynamics of exon-intron architecture and its impact on splicing is largely unknown. Using a comparative genomic approach, we analyzed 17 vertebrate genomes and reconstructed the ancestral motifs of both 3 and 5 splice sites, as also the ancestral length of exons and introns. Our analyses suggest that vertebrate introns increased in length from the shortest ancestral introns to the longest primate introns. An evolutionary analysis of splice sites revealed that weak splice sites act as a restrictive force keeping introns short. In contrast, strong splice sites allow recognition of exons flanked by long introns. Reconstruction of the ancestral state suggests these phenomena were not prevalent in the vertebrate ancestor, but appeared during vertebrate evolution. By calculating evolutionary rate shifts in exons, we identified cis-acting regulatory sequences that became fixed during the transition from early vertebrates to mammals. Experimental validations performed on a selection of these hexamers confirmed their regulatory function. We additionally revealed many features of exons that can discriminate alternative from constitutive exons. These features were integrated into a machine-learning approach to predict whether an exon is alternative. Our algorithm obtains very high predictive power (AUC of 0.91), and using these predictions we have identified and successfully validated novel alternatively spliced exons. Overall, we provide novel insights regarding the evolutionary constraints acting upon exons and their recognition by the splicing machinery.

2011

Detection and removal of biases in the analysis of next-generation sequencing reads

Schwartz S., Ram O. & Ast G. (2011) PLoS ONE. 6, 1, e16685.

Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads.

2010

Position-dependent alternative splicing activity revealed by global profiling of alternative splicing events regulated by PTB

Llorian M., Schwartz S., Clark T. A., Hollander D., Tan L., Spellman R., Gordon A., Schweitzer A. C., de la Grange P., Ast G. & Smith C. W. J. (2010) Nature Structural & Molecular Biology. 17, 9, p. 1114-1123

To gain global insights into the role of the well-known repressive splicing regulator PTB, we analyzed the consequences of PTB knockdown in HeLa cells using high-density oligonucleotide splice-sensitive microarrays. The major class of identified PTB-regulated splicing event was PTB-repressed cassette exons, but there was also a substantial number of PTB-activated splicing events. PTB-repressed and PTB-activated exons showed a distinct arrangement of motifs with pyrimidine-rich motif enrichment within and upstream of repressed exons but downstream of activated exons. The N-terminal half of PTB was sufficient to activate splicing when recruited downstream of a PTB-activated exon. Moreover, insertion of an upstream pyrimidine tract was sufficient to convert a PTB-activated exon to a PTB-repressed exon. Our results show that PTB, an archetypal splicing repressor, has variable splicing activity that predictably depends upon its binding location with respect to target exons.

Chromatin density and splicing destiny: On the cross-talk between chromatin structure and splicing

Schwartz S. & Ast G. (2010) EMBO Journal. 29, 10, p. 1629-1636

How are short exonic sequences recognized within the vast intronic oceans in which they reside? Despite decades of research, this remains one of the most fundamental, yet enigmatic, questions in the field of pre-mRNA splicing research. For many years, studies aiming to shed light on this process were focused at the RNA level, characterizing the manner by which splicing factors and auxiliary proteins interact with splicing signals, thereby enabling, facilitating and regulating splicing. However, we increasingly understand that splicing is not an isolated process; rather it occurs co-transcriptionally and is presumably also regulated by transcription-related processes. In fact, studies by our group and others over the past year suggest that DNA structure in terms of nucleosome positioning and specific histone modifications, which have a well established role in transcription, may also have a role in splicing. In this review we discuss evidence for the coupling between transcription and splicing, focusing on recent findings suggesting a link between chromatin structure and splicing, and highlighting challenges this emerging field is facing.

Large-scale discovery of insertion hotspots and preferential integration sites of human transposed elements

Levy A., Schwartz S. & Ast G. (2010) Nucleic Acids Research. 38, 5, p. 1515-1530 gkp1134.

Throughout evolution, eukaryotic genomes have been invaded by transposable elements (TEs). Little is known about the factors leading to genomic proliferation of TEs, their preferred integration sites and the molecular mechanisms underlying their insertion. We analyzed hundreds of thousands nested TEs in the human genome, i.e. insertions of TEs into existing ones. We first discovered that most TEs insert within specific 'hotspots' along the targeted TE. In particular, retrotransposed Alu elements contain a non-canonical single nucleotide hotspot for insertion of other Alu sequences. We next devised a method for identification of integration sequence motifs of inserted TEs that are conserved within the targeted TEs. This method revealed novel sequences motifs characterizing insertions of various important TE families: Alu, hAT, ERV1 and MaLR. Finally, we performed a global assessment to determine the extent to which young TEs tend to nest within older transposed elements and identified a 4-fold higher tendency of TEs to insert into existing TEs than to insert within non-TE intergenic regions. Our analysis demonstrates that TEs are highly biased to insert within certain TEs, in specific orientations and within specific targeted TE positions. TE nesting events also reveal new characteristics of the molecular mechanisms underlying transposition.

2009

The Pivotal Roles of TIA Proteins in 5 ' Splice-Site Selection of Alu Exons and Across Evolution

Gal-Mark N., Schwartz S., Ram O., Eyras E. & Ast G. (2009) PLoS Genetics. 5, 11, 1000717.

More than 5% of alternatively spliced internal exons in the human genome are derived from Alu elements in a process termed exonization. Alus are comprised of two homologous arms separated by an internal polypyrimidine tract (PPT). In most exonizations, splice sites are selected from within the same arm. We hypothesized that the internal PPT may prevent selection of a splice site further downstream. Here, we demonstrate that this PPT enhanced the selection of an upstream 59 splice site (5'ss), even in the presence of a stronger 5'ss downstream. Deletion of this PPT shifted selection to the stronger downstream 5'ss. This enhancing effect depended on the strength of the downstream 5'ss, on the efficiency of base-pairing to U1 snRNA, and on the length of the PPT. This effect of the PPT was mediated by the binding of TIA proteins and was dependent on the distance between the PPT and the upstream 5'ss. A wide-scale evolutionary analysis of introns across 22 eukaryotes revealed an enrichment in PPTs within similar to 20 nt downstream of the 5'ss. For most metazoans, the strength of the 5'ss inversely correlated with the presence of a downstream PPT, indicative of the functional role of the PPT. Finally, we found that the proteins that mediate this effect, TIA and U1C, and in particular their functional domains, are highly conserved across evolution. Overall, these findings expand our understanding of the role of TIA1/TIAR proteins in enhancing recognition of exons, in general, and Alu exons, in particular.

Chromatin organization marks exon-intron structure

Schwartz S., Meshorer E. & Ast G. (2009) Nature Structural & Molecular Biology. 16, 9, p. 990-U117

An increasing body of evidence indicates that transcription and splicing are coupled, and it is accepted that chromatin organization regulates transcription. Little is known about the cross-talk between chromatin structure and exon-intron architecture. By analysis of genome-wide nucleosome-positioning data sets from humans, flies and worms, we found that exons show increased nucleosome-occupancy levels with respect to introns, a finding that we link to differential GC content and nucleosome-disfavoring elements between exons and introns. Analysis of genome-wide chromatin immunoprecipitation data in humans and mice revealed four specific post-translational histone modifications enriched in exons. Our findings indicate that previously described enrichment of H3K36me3 modifications in exons reflects a more fundamental phenomenon, namely increased nucleosome occupancy along exons. Our results suggest an RNA polymerase II-mediated cross-talk between chromatin structure and exon-intron architecture, implying that exon selection may be modulated by chromatin structure.

Alu exonization events reveal features required for precise recognition of exons by the splicing machinery

Schwartz S., Gal-Mark N., Kfir N., Ram O., Kim E. & Ast G. (2009) PLoS Computational Biology. 5, 3,

Despite decades of research, the question of how the mRNA splicing machinery precisely identifies short exonic islands within the vast intronic oceans remains to a large extent obscure. In this study, we analyzed Alu exonization events, aiming to understand the requirements for correct selection of exons. Comparison of exonizing Alus to their non-exonizing counterparts is informative because Alus in these two groups have retained high sequence similarity but are perceived differently by the splicing machinery. We identified and characterized numerous features used by the splicing machinery to discriminate between Alu exons and their non-exonizing counterparts. Of these, the most novel is secondary structure: Alu exons in general and their 59 splice sites (59ss) in particular are characterized by decreased stability of local secondary structures with respect to their non-exonizing counterparts. We detected numerous further differences between Alu exons and their non-exonizing counterparts, among others in terms of exon-intron architecture and strength of splicing signals, enhancers, and silencers. Support vector machine analysis revealed that these features allow a high level of discrimination (AUC = 0.91) between exonizing and non-exonizing Alus. Moreover, the computationally derived probabilities of exonization significantly correlated with the biological inclusion level of the Alu exons, and the model could also be extended to general datasets of constitutive and alternative exons. This indicates that the features detected and explored in this study provide the basis not only for precise exon selection but also for the fine-tuned regulation thereof, manifested in cases of alternative splicing.

SROOGLE: Webserver for integrative, user-friendly visualization of splicing signals

Schwartz S., Hall E. & Ast G. (2009) Nucleic Acids Research. 37, SUPPL. 2, p. W189-W192

Exons are typically only 140 nt in length and are surrounded by intronic oceans that are thousands of nucleotides long. Four core splicing signals, aided by splicing-regulatory sequences (SRSs), direct the splicing machinery to the exon/intron junctions. Many different algorithms have been developed to identify and score the four splicing signals and thousands of putative SRSs have been identified, both computationally and experimentally. Here we describe SROOGLE, a webserver that makes splicing signal sequence and scoring data available to the biologist in an integrated, visual, easily interpretable, and user-friendly format. SROOGLE's input consists of the sequence of an exon and flanking introns. The graphic browser output displays the four core splicing signals with scores based on nine different algorithms and highlights sequences belonging to 13 different groups of SRSs. The interface also offers the ability to examine the effect of point mutations at any given position, as well a range of additional metrics and statistical measures regarding each potential signal. SROOGLE is available at http://sroogle.tau.ac.il, and may also be downloaded as a desktop version.

2008

Multifactorial interplay controls the splicing profile of Alu-derived exons

Ram O., Schwartz S. & Ast G. (2008) Molecular and Cellular Biology. 28, 10, p. 3513-25

Exonization of Alu elements creates primate-specific genomic diversity. Here we combine bioinformatic and experimental methodologies to reconstruct the molecular changes leading to exon selection. Our analyses revealed an intricate network involved in Alu exonization. A typical Alu element contains multiple sites with the potential to serve as 5' splice sites (5'ss). First, we demonstrated the role of 5'ss strength in controlling exonization events. Second, we found that a cryptic 5'ss enhances the selection of a more upstream site and demonstrate that this is mediated by binding of U1 snRNA to the cryptic splice site, challenging the traditional role attributed to U1 snRNA of binding the 5'ss only. Third, we used a simple algorithm to identify specific sequences that determine splice site selection within specific Alu exons. Finally, by inserting identical exons within different sequences, we demonstrated the importance of flanking genomic sequences in determining whether an Alu exon will undergo exonization. Overall, our results demonstrate the complex interplay between at least four interacting layers that affect Alu exonization. These results shed light on the mechanism through which Alu elements enrich the primate transcriptome and allow a better understanding of the exonization process in general.

Alternative splicing of Alu exons: two arms are better than one

Gal-Mark N., Schwartz S. & Ast G. (2008) Nucleic Acids Research. 36, 6, p. 2012-2023

Alus, primate-specific retroelements, are the most abundant repetitive elements in the human genome. They are composed of two related but distinct monomers, left and right arms. Intronic Alu elements may acquire mutations that generate functional splice sites, a process called exonization. Most exonizations occur in right arms of antisense Alu elements, and are alternatively spliced. Here we show that without the left arm, exonization of the right arm shifts from alternative to constitutive splicing. This eliminates the evolutionary conserved isoform and may thus be selected against. We further show that insertion of the left arm downstream of a constitutively spliced non-Alu exon shifts splicing from constitutive to alternative. Although the two arms are highly similar, the left arm is characterized by weaker splicing signals and lower exonic splicing regulatory (ESR) densities. Mutations that improve these potential splice signals activate exonization and shift splicing from the right to the left arm. Collaboration between two or more putative splice signals renders the intronic left arm with a pseudo-exon function. Thus, the dimeric form of the Alu element fortuitously provides it with an evolutionary advantage, allowing enrichment of the primate transcriptome without compromising its original repertoire.

Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes

Schwartz S. H., Silva J., Burstein D., Pupko T., Eyras E. & Ast G. (2008) Genome Research. 18, 1, p. 88-103

Introns are among the hallmarks of eukaryotic genes. Splicing of introns is directed by three main splicing signals: the 5 splice site (5ss), the branch site (BS), and the polypyrimdine tract/3splice site (PPT-3ss). To study the evolution of these splicing signals, we have conducted a systematic comparative analysis of these signals in over 1.2 million introns from 22 eukaryotes. Our analyses suggest that all these signals have dramatically evolved: The PPT is weak among most fungi, intermediate in plants and protozoans, and strongest in metazoans. Within metazoans it shows a gradual strengthening from Caenorhabditis elegans to human. The 5ss and the BS were found to be degenerate among most organisms, but highly conserved among some fungi. A maximum parsimony-based algorithm for reconstructing ancestral position-specific scoring matrices suggested that the ancestral 5ss and BS were degenerate, as in metazoans. To shed light on the evolutionary variation in splicing signals, we have analyzed the evolutionary changes in the factors that bind these signals. Our analysis reveals coevolution of splicing signals and their corresponding splicing factors: The strength of the PPT is correlated to changes in key residues in its corresponding splicing factor U2AF2; limited correlation was found between changes in the 5ss and U1 snRNA that binds it; but not between the BS and U2 snRNA. Thus, although the basic ability of eukaryotes to splice introns has remained conserved throughout evolution, the splicing signals and their corresponding splicing factors have considerably evolved, uniquely shaping the splicing mechanisms of different organisms.