Bioinformatics Unit
Is devoted to advancing scientific understanding of living systems through computation. The unit promotes and supports the adoption, use, and development of bioinformatics tools for advancing biological research. We organize and teach courses and workshops for all our services, as well as train individually. Since bioinformatics is such a broad field, our efforts are focused on those topics that are most needed by Weizmann researchers.
The Bioinformatics Unit is supported by the Wertheimer Center for Computational Biology.
Data Analysis & Programming
Support is available for complete analyses, and for teaching researchers/students how to analyze their results independently. We encourage researchers to consult us when planning a project.
Click on the analysis titles below for detailed information.
The advent of deep sequencing platforms has opened exciting new avenues for life science researchers. We provide comprehensive support across a wide range of next-generation sequencing (NGS) and third generation long sequencing (PacBio, Nanopore) technologies, covering every stage from experimental design and raw data processing to advanced downstream bioinformatics and integrative multi-omics interpretation.
Bulk Transcriptome Analysis
We support multiple transcriptomic profiling protocols, including RNA-Seq (short and long reads), MARS-seq, SCRB-Seq, Ribo-Seq, CLIP-Seq, and small RNA-Seq. Our analysis pipeline includes sequencing quality control and adapter trimming using Cutadapt, FastQC, and MultiQC, followed by alignment or pseudo-alignment to reference genomes and transcriptomes with STAR, Bowtie2, or RSEM. Gene and transcript quantification is performed, followed by differential expression analysis with DESeq2. After differential expression, clustering analysis is performed to identify groups of co-expressed genes or biologically similar samples, providing insight into underlying expression patterns and sample relationships as well as pathway enrichment (see below).
Epigenomics Analysis
Our epigenomics services support a range of assays for studying chromatin states and DNA modifications, including ChIP-Seq, ATAC-Seq, and CUT&RUN for chromatin accessibility, histone modification, and transcription factor binding profiling. We also provide DNA methylation profiling through Whole-Genome Bisulfite Sequencing (WGBS) and Reduced Representation Bisulfite Sequencing (RRBS). The pipeline includes peak calling with MACS2 or SEACR, differential binding or methylation analysis using DiffBind and Bismark, and motif discovery to identify regulatory patterns and epigenetic signatures with deepTools.
Metagenomics and Microbial Analysis
We offer comprehensive microbial community profiling and comparative genomics, including bacterial 16S rRNA analysis for detecting and quantifying microbial community composition and estimating relative taxon abundance. This involves Operational Taxonomic Unit (OTU) clustering and taxonomic assignment using reference databases such as SILVA, Greengenes, or RDP, followed by alpha and beta diversity analysis in QIIME2. Fungal ITS amplicon sequencing is performed for identifying fungal diversity using the DADA2 workflow and the UNITE reference database. We also conduct dual RNA-Seq for host–pathogen transcriptomics and metagenomics for functional profiling of microbial communities, including read assembly with SPAdes, gene prediction and annotation with KEGG, COG, Pfam, and InterProScan, and taxonomic profiling with Kraken2, MetaPhlAn, or Kaiju. De novo genome or transcriptome assembly is supported for short-read (Illumina, Element AVITI) or long-read (ONT, PacBio) platforms. Our analyses include diversity metrics, abundance estimation, functional annotation, and strain-level reconstruction.
High-Throughput Multi-Omics Integration
We specialize in integrating transcriptomics, proteomics, metabolomics, and peptidomics data to achieve holistic, systems-level insights using methods such as Multi-Omics Factor Analysis (MOFA). This includes statistical and multivariate analyses, cross-platform normalization and correlation, multi-factorial data integration, and interactive data visualization with pathway-level interpretation.
Pathway and Network Analysis
We perform pathway enrichment, gene set analysis, and biological network inference to identify key molecular drivers, regulatory modules, and potential therapeutic targets. Analyses are conducted using tools such as QIAGEN Ingenuity Pathway Analysis (IPA), Gene Set Enrichment Analysis (GSEA), BioCyc Pathway/Genome Database Collection, Reactome, KEGG, and GO databases, among others. Biological network construction and visualization are performed using Cytoscape and STRING, while co-expression network analysis is carried out with WGCNA or BioNERO.
Single-Cell and Spatial Transcriptomics
We provide in-depth analysis of single-cell and spatial omics data, integrating multiple data modalities to achieve comprehensive cellular insights. Supported protocols include 10x Genomics scRNA-Seq, scATAC-Seq, CITE-Seq, immune repertoire profiling (BCR/TCR clonotypes), and spatial transcriptomics (e.g., Visium HD). Our analytical expertise begins with CellRanger pipelines, followed by in-depth downstream analysis using Seurat, Signac, Scanpy, and more. These frameworks enable high-resolution single-cell and multi-omics integration, including cell-level quality control and filtering with EmptyDrops, CellBender, and SoupX; apply batch correction and integrative analysis across datasets as well as dimensionality reduction using PCA, UMAP, and t-SNE, clustering, marker gene detection, and cell-type annotation (SingleR, transfer label, and more). We perform trajectory, diffusion map, and pseudotime analysis using Monocle and Destiny, investigate cell–cell communication and ligand–receptor interactions with CellChat, CellPhoneDB, and NicheNet. For chromatin accessibility studies, we conduct ATAC-Seq peak calling, motif analysis, and link gene expression with open chromatin peaks.
Genetic Variation Analysis
We detect and annotate single-nucleotide polymorphisms (SNPs), insertions and deletions (indels), structural variants, and copy number variations using best-practice tools such as GATK, bcftools, and FreeBayes. Results are presented with biologically meaningful interpretation and clear visualization.
Sequencing Data Processing and Submission
We provide end-to-end sequence data management, including demultiplexing and processing from platforms such as Illumina, Element Biosciences AVITI, and Oxford Nanopore. Our services include standardized quality control reports, metadata preparation, and assistance with public data submission to repositories such as GEO, SRA, and ENA.
CRISPR: We advise and design experiments for various CRISPR systems, including spCas9, saCas9, Cas12a, Cas13 (RNA cleavage), base editing, prime editing, and library screens. New methods are incorporated as they arise. Our design is genome based, attempting to maximize cleavage with minimal off targets. We also design repair oligos as necessary, either short or long single-stranded DNA, plasmids, and AAV. We design gene knock-outs, simple and complex knock-ins, reporter genes, and conditional alleles. We also design genotyping strategies and help decipher the sequencing of the offspring. We design CRISPR experiments for mice, cell lines, plants, worms, diatoms, and any species that has a genome. If no genome is available limited design can still be done, but analyzing the results will be more complex.
Classical bioinformatics problems, including:
Analysis of Sanger sequencing
Cloning design: helping people design expression plasmids, placements of markers and mutations, choosing expressions systems.
Primer and probe design: we design primers for mRNA, genomic DNA, qPCR, genotyping (including redesign of primers to match the specifications of the Genotyping Unit in the Veterinary Services department). We also design probes for smFISH (single molecule fish) analysis, choosing the optimal target region of the sequence.
Database searches: Including comprehensive searches in the relevant databases in genomic DNA, mRNA, protein, in whatever species is needed.
Multiple alignments: comparison of sequences both intra- and inter-species.
Prediction of post-translational modifications: phosphorylation, GPI-anchor, myristoylation, prenylation, signal peptides, glycosylation sites, proteolytic cleavage and prediction of subcellular localization of proteins.
Phylogenetic analysis including finding and deciding which sequences/species to use, which alignment algorithm, which tree algorithm (Neighbor-Joining, Maximum Likelihood, Maximum Parsimony, etc), including assistance in proper display of trees (iTol).
Promoter analysis: Defining transcription start sites, finding binding sites for known factors, defining new binding sites, for individual genes or groups of genes.
Motif/domain finding and definition: Help in defining in both DNA (promoters, as expanded on above), and protein sequences.
Protein secondary structure prediction, including transmembrane domain prediction.
Antigen design for antibody development and MHC epitope prediction.
Single nucleotide polymorphism analysis (SNP): Facilitating the process of identifying genetic variations in the DNA sequence at a single base-pair level to reveal how individual differences in genetic traits, disease susceptibility, and drug responses are influenced by genetic variations.
Genome annotation and cross genome comparison.
Splice variant analysis: Help showing how different protein variants can be produced from a single gene through different combinations of exons, a process called alternative splicing using RNA and DNA data with computational tools.
siRNA, miRNA analysis: Carry out the prediction of sites, suggestion of sequences involved, and designing primers to test for the predictions.
Depositing sequences into GenBank.
Bespoke analysis: Unlike off-the-shelf solutions, bespoke analysis is built from the ground up to answer the specific questions and meet the unique objectives of your project.
Experimental design – Planning an experiment in advance while accounting for adequate sample size, statistical power and possible confounding effects.
Statistical analysis – Formal statistical analysis for any type of experimental data (e.g. gene expression, metabolomics, proteomics, behavioral, clinical, etc.)
Example for methods: linear models, mixed effects models, survival analysis, path analysis (structural equation modeling), etc.
Classical machine learning and dimensionality reduction – PCA, UMAP, logistic regression, tree models, random forest, etc.
Presentation and data visualization – Publication-level figures for presenting your results as clearly as possible.
Teaching/guidance – All analysis steps are fully explained and documented by script, so students are welcome to learn how to run analyses independently.
We collaborate on research projects applying Artificial Intelligence across scientific disciplines, with expertise in Machine Learning, Deep Learning, and model-based AI techniques. We specialize in Computer Vision for Bio-medical Imaging, analyzing MRI, CT, and Ultrasound datasets, with a particular focus on improving diagnostic accuracy of different diseases and medical conditions, and quantifying organ morphologies and changes. We also analyze Bio-imaging datasets like Atomic Force Microscopy (AFM) data and Optical Microscopy images of cytoskeleton, cells, and nuclei, to quantify cellular components and their interactions, and to study cellular mechanisms. Our work also involves Natural Language Processing for DNA and peptide analyses and Machine Learning for spectroscopic data, including Raman spectroscopy and Mass Spectrometry (MS). Methodologically, we implement Python for designing Deep Learning architectures, with TensorFlow and PyTorch frameworks, Machine Learning pipelines (with specialty in scikit-learn library), advanced numerical analyses, and traditional computer vision algorithms (with specialty in OpenCV library) to extract insights from data.
Among the topics we are specialized are: Classification, Regression, Segmentation, Objects detection, Objects tracking, Feature selection, Feature engineering, Unsupervised Machine Learning, and models interpretability for understanding model decisions.
We provide direct collaborations and consulting to help researchers integrate AI-driven solutions into their workflows.
We are developing and maintaining advanced software systems that support laboratory management, scientific data organization, and the interface between researchers and core facility services across the institute. Our development work focuses on creating customized applications that address the specific operational and research needs of the laboratories.
UTAP2 is a user-friendly transcriptome and epigenome analysis pipeline designed by our unit members and published: "An enhanced user-friendly transcriptome and epigenome analysis pipeline". BMC Bioinformatics 2025, 26:79 . This widely used web-based application was developed to enable Weizmann researchers to process raw sequencing data, perform differential expression analysis, and visualize results through an intuitive interface without the need for programming expertise. The platform is also used for teaching purposes and can be publicly installed at other institutions. For assistance or inquiries, please contact utap@weizmann.ac.il .
Developed and maintained for more than a decade, BioImg continues to serve as the Institute’s central platform for managing and organizing microscopy data. BioImg provides structured storage, easy access, and efficient data sharing across research groups. The redeveloped BioImg system, released in 2025, features an improved interface, enhanced user experience, and integration with StorWIS, the Institute’s scalable storage infrastructure. It automates the synchronization of microscopy data, sends notifications upon completion, and allows users to manage, annotate, search, and share data directly through a web-based interface.
The NGS-pipeline automates post-sequencing analysis, empowering users to independently process BCL files. Upon starting a sequencing run the user uploads a sample sheet, triggering the start of the pipeline that will automatically generate FASTQ files, demultiplex samples, and perform QC analysis. Users receive email notification with links to QC reports and data download upon completion.
The Laboratory Information Management System (LIMS) developed for the Structural Proteomics Unit (SPU). This tailor-made system efficiently manages large volumes of data by tracking samples, experiments, instruments, and workflows. The SPU LIMS includes three interconnected workflows: Protein Expression, Protein Purification, and Crystallization - covering every step from gene expression to protein structure determination. The system provides data tracking and integration, significantly improving laboratory efficiency.
Beyond these flagship systems, the Unit also develops and supports many specialized tools for additional facilities, such as DNA-Seq for ordering and browsing the results of whole genome sequencing, and Transfer WEXAC-StorWIS, a service enabling seamless transfer and secure storage of data for WEXAC users.
Together, these systems provide the infrastructure that enables efficient laboratory operation, easy access to research data, data integrity, and sustainable data management across the Institute.
