How do transcription factors detect their binding sites in the vast genome?
Transcription factors (TFs) regulate gene expression by binding DNA sequence motifs recognized by their DNA binding domains (DBDs). Such motifs are short and highly abundant in genomes. The ability of TFs to bind a specific subset of motif-containing sites rapidly upon activation is a fundamental yet a poorly understood aspect of gene expression. We define principles driving this TF-target search process using a combination of approaches including genomic TF binding profiling, massive variant libraries, single-molecule live-cell microscopy and theory. Questions that drive us are:
- At the level of TFs: What is the relative role of the DBD, as compared to the remaining TF regions, in defining the in vivo binding locations?
2. At the level of DNA regulatory sequences: what is the contribution of sequences flanking the short DBD-recognized sequences to the binding of TFs in vivo?
3. What is the role of combinatorics, i.e. the interactions between multiple TFs, in defining TF binding specificity?
4. How do TFs diverge during evolution, and what can we learn from their evolution on the design of present-day networks and the mechanism of TF binding specificity?
To answer these questions, we adapted existing technologies that allow rapid profiling of the genomic binding and expression profiles corresponding to dozens of TFs in parallel. A CRISPR-based setup for the generation of multiple deletions or TFs variants allows genome-scale characterization of multiple TF variants in different genetic backgrounds. We have further established a new technology for screening TF-target binding for millions of promoter or TF variants within living cells. Our data requires computational analysis as well as implying Machine Learning and Deep Learning tools which we develop.
DNA binding proteins are highly disordered; Why?
TFs, as well as other DNA and RNA binding proteins (RBPs), contain an over-represented fraction of low-complexity regions predicted to lack a stable 3D structure. In certain TFs and RBPs, these so called intrinsically disordered regions (IDRs) often span hundreds of amino acids and represent a major fraction of the sequences of these proteins. The functional role of these IDRs and their contribution to protein function remains largely unknown. In a recent study, we unexpectedly found that IDRs of two model TFs play a key role in TF-binding specificity, being both required and sufficient for recognizing most TF-target promoters in-vivo. This IDR-based specificity was conserved among distant species and achieved through the cumulative contribution of multiple weak and partially redundant determinants distributed throughout these long (~600 residues) IDRs. To understand this mechanism, we are currently asking:
- How general is the role of IDRs in guiding TFs, or other DNA-binding proteins, to their in-vivo binding sites? What distinguishes TFs that rely on their IDRs for achieving specificity from those that do not? What distinguishes IDR-based promoter recognition from other specificity mechanisms?
2. What is the contribution of IDRs within TFs to other transcription-related processes, including TF-target search dynamics, transcription activation and nucleosome dynamics?
3. What is the role of IDRs found in RNA binding proteins (RBPs)? Are these regions important for RNA binding specificity? Do they contribute to consequences of this binding?
4. What is the molecular basis and sequence grammar guiding IDR-based promoter binding?
Experimental setups used to address these questions include (in addition to those described above) a sequencing-based method for profiling RBPs-binding to mRNAs and single-molecule analysis of TF dynamics within living cells.
Epigenetics and the role of histone exchange in genome regulation:
Eukaryotic DNA is wrapped around nucleosomes: histone octamers that bind DNA tightly. All processes that use DNA as their substrate depend on the precise binding locations of nucleosomes, on their epigenetic modifications and on their binding stability. We are interested in the principles of this chromatin-based (epigenetic) regulation that serves as a platform to communicate between different DNA-related processes. We are currently most interested in the role and regulation of histone exchange, namely the replacement of DNA-bound histones with their freely available counterparts.