Publications

2025

Quantifying transcription factor specificity with advanced DNA universal microarrays featuring long and modified binding sites
Bayer Y., OHagan M. P., Miodownik I. & Afek A. (2025) Nucleic Acids Research. 53, 22, gkaf1382. Abstract
Transcription factor (TF)-DNA binding specificity, shaped by both sequence and epigenetic modifications, is central to gene regulation. Universal protein-binding microarrays (uPBMs), based on compact de Bruijn sequence designs, have emerged as powerful tools to characterize the specificity of hundreds of TFs. However, conventional uPBMs binding measurements are limited to direct measurement of short ($\le$8 bp) motifs composed of four canonical bases, lacking the ability to resolve the effects of extended sequence context or modifications. To address these limitations, we developed two enhanced platforms: Ex-uPBM, based on extended higher-order de Bruijn sequences, and Mod-uPBM, based on de Bruijn sequences that incorporate modified bases. Applying Ex-uPBM to known TFs allowed direct measurements for motifs up to 10 bp long and exposed specificity to flanking regions, unattainable in standard uPBM. By applying Mod-uPBM, we measured the effect of 5-methylcytosine (5mC) in all possible contexts, summarized in a full energetic position weight matrix (PWM). This PWM not only reproduced known TF binding specificity but also revealed context-specific energetic effects of 5mC in the full consensus motif at single-nucleotide resolution. Together, our platforms provide a robust and scalable strategy for TF binding quantification that better captures the sequence and modification complexity of genomic DNA.
DNA methylation shapes transcription factor binding beyond canonical CpG contexts
Miodownik I., Solozabal R., OHagan M. P., Albeck S., Peleg Y., Takac M. & Afek A. (2025) Proceedings of the National Academy of Sciences - PNAS. 122, 51, e252081412. Abstract
Cytosine methylation is a key epigenetic modification that regulates transcription factor (TF) binding and gene expression. While most current understanding of methylation-sensitive TF binding derives from studies focused exclusively on fully methylated CpG sites, alternative formssuch as non-CpG and hemimethylationare increasingly recognized as widespread and functionally important, particularly in embryonic stem cells and neurons. However, the direct impact of these alternative methylation contexts on TFDNA interactions remains poorly defined, largely because current binding assays introduce methylation enzymatically, which precludes strand-specific and position-resolved measurements. Here, we systematically profile the methylation sensitivity of 18 human TFs spanning 11 structural families using chemically synthesized DNA libraries containing position-specific 5-methylcytosines (5mC) in CpG, non-CpG, and hemimethylated contexts, measured via high-throughput protein-binding microarrays. Our results reveal extensive TF sensitivity to methylation state, position, and strand orientation, including strong binding of several TFs to non-CpG and hemimethylated sites. The presence of 5mC can dramatically alter TFDNA interactions: transforming low-affinity sites into high-affinity ones by enabling new contacts or silencing otherwise favorable motifs through steric hindrance. Genomic analyses further show that the methylation-sensitive sequences identified in vitro are represented within enhancers and regulatory elements, exhibiting distinct methylation patterns across cell types. Together, our findings uncover a previously hidden layer of methylation-dependent TFDNA recognition, broadening the understanding of epigenetics in transcriptional regulation.
Chemical Engineering of Transcription Factors UncoveredCell-Permeable μMax Modulators
Harel O., Nadal-Bufi F., Nithun R. V., Yao Y. M., Afek A., Vendrell M. & Jbara M. (2025) Journal of the American Chemical Society. 147, 46, p. 42647-42658 Abstract
Transcription factor engineering has emerged as a powerfulstrategyfor generating novel proteins for fundamental research and biomedicalapplications. Although various analogs have been developed, they remainlargely constrained to native sequences and structures. The generationof advanced analogs bearing noncanonical modifications with enhancedfunctional properties remains limited. Here we combined rational designwith total synthesis to engineer novel abiotic transcription factorswith enhanced stability and cell permeability. Using solid-phase synthesisand native chemical ligation, we created a library of 30 Max-derivedtranscription factor analogs incorporating novel modifications, suchas sequence mutations and aromatic staples at strategic sites. ThroughDNA-binding analysis and cellular uptake studies, we identified the μMax20 analog, which contains two mutations (Lys31 andLys57 to hArg) and exhibits potent DNA binding to the canonical enhancerbox (E-box) as well as intrinsic cell permeability. Notably, furthersite-specific modifications of μMax20 with aromaticstaples yielded improved analogs with enhanced stability and remarkablecellular delivery at nanomolar concentrations. Our lead μMax20 analog suppressed Myc-driven gene expression, as demonstrated byreporter gene assays and antiproliferative activity against Myc-dependentcancer cells. Altogether, these results highlight how combining chemicalprotein synthesis with late-stage modifications can be leveraged toenhance protein function and engineer novel bioactive modulators.
DNA mutagenesis driven by transcription factor competition with mismatch repair
Zhu W., Zhang Y., Sahay H. et_al. (2025) Cell. 188, 20, p. 5735-5747.e15 Abstract
Despite the remarkable fidelity of eukaryotic DNA replication, nucleotide misincorporation errors occur in every replication cycle, generating mutations that drive genetic diseases and genome evolution. Here, we show that transcription factor (TF) proteins, key players in gene regulation, can increase mutagenesis from replication errors by directly competing with the recognition of DNA mismatches by MutSα, the primary initiator of eukaryotic mismatch repair (MMR). We demonstrate this TF-induced mutagenesis mechanism using a yeast genetic assay that quantifies the accumulation of mutations in TF binding sites. Analyses of human cancer mutations recapitulate the trends observed in yeast, with mutations arising from MYC-bound mismatches being enriched in MMR-proficient cells. These findings implicate TF-MMR competition as a critical determinant of somatic hypermutation at TF binding sites in cancer. Furthermore, our results provide a molecular mechanism for the higher-than-expected rate of rare genetic variants at TF binding sites, with important implications for regulatory DNA evolution.

2024

Deciphering the dynamic code: DNA recognition by transcription factors in the ever-changing genome
Yao Y. M., Miodownik I., OHagan M. P., Jbara M. & Afek A. (2024) Transcription. 15, 3-5, p. 114-138 Abstract
Transcription factors (TFs) intricately navigate the vast genomic landscape to locate and bind specific DNA sequences for the regulation of gene expression programs. These interactions occur within a dynamic cellular environment, where both DNA and TF proteins experience continual chemical and structural perturbations, including epigenetic modifications, DNA damage, mechanical stress, and post-translational modifications (PTMs). While many of these factors impact TF-DNA binding interactions, understanding their effects remains challenging and incomplete. This review explores the existing literature on these dynamic changes and their potential impact on TF-DNA interactions.
Site-Specific Acetylation of the Transcription Factor Protein Max Modulates Its DNA Binding Activity
Nithun R. V., Yao Y. M., Harel O., Habiballah S., Afek A. & Jbara M. (2024) ACS Central Science. 10, 6, p. 1295-1303 Abstract
Chemical protein synthesis provides a powerful means to prepare novel modified proteins with precision down to the atomic level, enabling an unprecedented opportunity to understand fundamental biological processes. Of particular interest is the process of gene expression, orchestrated through the interactions between transcription factors (TFs) and DNA. Here, we combined chemical protein synthesis and high-throughput screening technology to decipher the role of post-translational modifications (PTMs), e.g., Lys-acetylation on the DNA binding activity of Max TF. We synthesized a focused library of singly, doubly, and triply modified Max variants including site-specifically acetylated and fluorescently tagged analogs. The resulting synthetic analogs were employed to decipher the molecular role of Lys-acetylation on the DNA binding activity and sequence specificity of Max. We provide evidence that the acetylation sites at Lys-31 and Lys-57 significantly inhibit the DNA binding activity of Max. Furthermore, by utilizing high-throughput binding measurements, we assessed the binding activities of the modified Max variants across diverse DNA sequences. Our results indicate that acetylation marks can alter the binding specificities of Max toward certain sequences flanking its consensus binding sites. Our work provides insight into the hidden molecular code of PTM-TFs and DNA interactions, paving the way to interpret gene expression regulation programs.

2023

Deciphering the Role of the Ser-Phosphorylation Pattern on the DNA-Binding Activity of Max Transcription Factor Using Chemical Protein Synthesis
Nithun R. V., Yao Y. M., Lin X., Habiballah S., Afek A. & Jbara M. (2023) Angewandte Chemie - International Edition. 62, 47, e202310913. Abstract
The chemical synthesis of site-specifically modified transcription factors (TFs) is a powerful method to investigate how post-translational modifications (PTMs) influence TF-DNA interactions and impact gene expression. Among these TFs, Max plays a pivotal role in controlling the expression of 15 % of the genome. The activity of Max is regulated by PTMs; Ser-phosphorylation at the N-terminus is considered one of the key regulatory mechanisms. In this study, we developed a practical synthetic strategy to prepare homogeneous full-length Max for the first time, to explore the impact of Max phosphorylation. We prepared a focused library of eight Max variants, with distinct modification patterns, including mono-phosphorylated, and doubly phosphorylated analogues at Ser2/Ser11 as well as fluorescently labeled variants through native chemical ligation. Through comprehensive DNA binding analyses, we discovered that the phosphorylation position plays a crucial role in the DNA-binding activity of Max. Furthermore, in vitro high-throughput analysis using DNA microarrays revealed that the N-terminus phosphorylation pattern does not interfere with the DNA sequence specificity of Max. Our work provides insights into the regulatory role of Maxs phosphorylation on the DNA interactions and sequence specificity, shedding light on how PTMs influence TF function.
Short tandem repeats bind transcription factors to tune eukaryotic gene expression
Horton C. A., Alexandari A. M., Hayes M. G. et_al. (2023) Science (New York, N.Y.). 381, 6664, p. eadd1250 add1250. Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.

Submitted version
UV irradiation remodels the specificity landscape of transcription factors
Mielko Z., Zhang Y., Sahay H. et_al. (2023) Proceedings of the National Academy of Sciences of the United States of America. 120, 11, e221742212. Abstract
Somatic mutations are highly enriched at transcription factor (TF) binding sites, with the strongest trend being observed for ultraviolet light (UV)-induced mutations in melanomas. One of the main mechanisms proposed for this hypermutation pattern is the inefficient repair of UV lesions within TF-binding sites, caused by competition between TFs bound to these lesions and the DNA repair proteins that must recognize the lesions to initiate repair. However, TF binding to UV-irradiated DNA is poorly characterized, and it is unclear whether TFs maintain specificity for their DNA sites after UV exposure. We developed UV-Bind, a high-throughput approach to investigate the impact of UV irradiation on protein-DNA binding specificity. We applied UV-Bind to ten TFs from eight structural families, and found that UV lesions significantly altered the DNA-binding preferences of all the TFs tested. The main effect was a decrease in binding specificity, but the precise effects and their magnitude differ across factors. Importantly, we found that despite the overall reduction in DNA-binding specificity in the presence of UV lesions, TFs can still compete with repair proteins for lesion recognition, in a manner consistent with their specificity for UV-irradiated DNA. In addition, for a subset of TFs, we identified a surprising but reproducible effect at certain nonconsensus DNA sequences, where UV irradiation leads to a high increase in the level of TF binding. These changes in DNA-binding specificity after UV irradiation, at both consensus and nonconsensus sites, have important implications for the regulatory and mutagenic roles of TFs in the cell.

2021

Inferring primase-DNA specific recognition using a data driven approach
Soffer A., Eisdorfer S. A., Ifrach M., Ilic S., Afek A., Schussheim H., Vilenchik D. & Akabayov B. (2021) Nucleic Acids Research. 49, 20, p. 11447-11458 Abstract
DNA-protein interactions play essential roles in all living cells. Understanding of how features embedded in the DNA sequence affect specific interactions with proteins is both challenging and important, since it may contribute to finding the means to regulate metabolic pathways involving DNA-protein interactions. Using a massive experimental benchmark dataset of binding scores for DNA sequences and a machine learning workflow, we describe the binding to DNA of T7 primase, as a model system for specific DNA-protein interactions. Effective binding of T7 primase to its specific DNA recognition sequences triggers the formation of RNA primers that serve as Okazaki fragment start sites during DNA replication.

2020

DNA mismatches reveal conformational penalties in proteinDNA recognition
Afek A., Shi H., Rangadurai A. et_al. (2020) Nature. 587, 7833, p. 291-296 Abstract
Transcription factors recognize specific genomic sequences to regulate complex gene-expression programs. Although it is well-established that transcription factors bind to specific DNA sequences using a combination of base readout and shape recognition, some fundamental aspects of proteinDNA binding remain poorly understood^1,2. Many DNA-binding proteins induce changes in the structure of the DNA outside the intrinsic B-DNA envelope. However, how the energetic cost that is associated with distorting the DNA contributes to recognition has proven difficult to study, because the distorted DNA exists in low abundance in the unbound ensemble³⁹. Here we use a high-throughput assay that we term SaMBA (saturation mismatch-binding assay) to investigate the role of DNA conformational penalties in transcription factorDNA recognition. In SaMBA, mismatched base pairs are introduced to pre-induce structural distortions in the DNA that are much larger than those induced by changes in the WatsonCrick sequence. Notably, approximately 10% of mismatches increased transcription factor binding, and for each of the 22 transcription factors that were examined, at least one mismatch was found that increased the binding affinity. Mismatches also converted non-specific sites into high-affinity sites, and high-affinity sites into super sites that exhibit stronger affinity than any known canonical binding site. Determination of high-resolution X-ray structures, combined with nuclear magnetic resonance measurements and structural analyses, showed that many of the DNA mismatches that increase binding induce distortions that are similar to those induced by protein bindingthus prepaying some of the energetic cost incurred from deforming the DNA. Our work indicates that conformational penalties are a major determinant of proteinDNA recognition, and reveals mechanisms by which mismatches can recruit transcription factors and thus modulate replication and repair activities in the cell^10,11.

Accepted version
Control of transcription initiation by biased thermal fluctuations on repetitive genomic sequences
Imashimizu M., Tokunaga Y., Afek A., Takahashi H., Shimamoto N. & Lukatsky D. B. (2020) Biomolecules. 10, 9, p. 1-22 1299. Abstract
In the process of transcription initiation by RNA polymerase, promoter DNA sequences affect multiple reaction pathways determining the productivity of transcription. However, the question of how the molecular mechanism of transcription initiation depends on the sequence properties of promoter DNA remains poorly understood. Here, combining the statistical mechanical approach with high-throughput sequencing results, we characterize abortive transcription and pausing during transcription initiation byEscherichia coliRNA polymerase at a genome-wide level. Our results suggest that initially transcribed sequences, when enriched with thymine bases, contain the signal for inducing abortive transcription, whereas certain repetitive sequence elements embedded in promoter regions constitute the signal for inducing pausing. Both signals decrease the productivity of transcription initiation. Based on solution NMR and in vitro transcription measurements, we suggest that repetitive sequence elements within the promoter DNA modulate the nonlocal base pair stability of its double-stranded form. This stability profoundly influences the reaction coordinates of the productive initiation via pausing.

2019

Dna sequence recognition by dna primase using high-throughput primase profiling
Ilic S., Cohen S., Afek A., Gordan R., Lukatsky D. B. & Akabayov B. (2019) JoVE journal. 2019, 152, e59737. Abstract
DNA primase synthesizes short RNA primers that initiate DNA synthesis of Okazaki fragments on the lagging strand by DNA polymerase during DNA replication. The binding of prokaryotic DnaG-like primases to DNA occurs at a specific trinucleotide recognition sequence. It is a pivotal step in the formation of Okazaki fragments. Conventional biochemical tools that are used to determine the DNA recognition sequence of DNA primase provide only limited information. Using a high-throughput microarray-based binding assay and consecutive biochemical analyses, it has been shown that 1) the specific binding context (flanking sequences of the recognition site) influences the binding strength of the DNA primase to its template DNA, and 2) stronger binding of primase to the DNA yields longer RNA primers, indicating higher processivity of the enzyme. This method combines PBM and primase activity assay and is designated as high-throughput primase profiling (HTPP), and it allows characterization of specific sequence recognition by DNA primase in unprecedented time and scalability.
Unexpected implications of STAT3 acetylation revealed by genetic encoding of acetyl-lysine
Belo Y., Mielko Z., Nudelman H., Afek A., Ben-David O., Shahar A., Zarivach R., Gordan R. & Arbely E. (2019) Biochimica et Biophysica Acta - General Subjects. 1863, 9, p. 1343-1350 Abstract
The signal transducer and activator of transcription 3 (STAT3) protein is activated by phosphorylation of a specific tyrosine residue (Tyr705) in response to various extracellular signals. STAT3 activity was also found to be regulated by acetylation of Lys685. However, the molecular mechanism by which Lys685 acetylation affects the transcriptional activity of STAT3 remains elusive. By genetically encoding the co-translational incorporation of acetyl-lysine into position Lys685 and co-expression of STAT3 with the Elk receptor tyrosine kinase, we were able to characterize site-specifically acetylated, and simultaneously acetylated and phosphorylated STAT3. We measured the effect of acetylation on the crystal structure, and DNA binding affinity and specificity of Tyr705-phosphorylated and non-phosphorylated STAT3. In addition, we monitored the deacetylation of acetylated Lys685 by reconstituting the mammalian enzymatic deacetylation reaction in live bacteria. Surprisingly, we found that acetylation, per se, had no effect on the crystal structure, and DNA binding affinity or specificity of STAT3, implying that the previously observed acetylation-dependent transcriptional activity of STAT3 involves an additional cellular component. In addition, we discovered that Tyr705-phosphorylation protects Lys685 from deacetylation in bacteria, providing a new possible explanation for the observed correlation between STAT3 activity and Lys685 acetylation.
QBiC-Pred: Quantitative predictions of transcription factor binding changes due to sequence variants
Martin V., Zhao J., Afek A., Mielko Z. & Gordân R. (2019) Nucleic Acids Research. 47, W1, p. W127-W135 Abstract
Non-coding genetic variants/mutations can play functional roles in the cell by disrupting regulatory interactions between transcription factors (TFs) and their genomic target sites. For most human TFs, a myriad of DNA-binding models are available and could be used to predict the effects of DNA mutations on TF binding. However, information on the quality of these models is scarce, making it hard to evaluate the statistical significance of predicted binding changes. Here, we present QBiC-Pred, a web server for predicting quantitative TF binding changes due to nucleotide variants. QBiC-Pred uses regression models of TF binding specificity trained on high-throughput in vitro data. The training is done using ordinary least squares (OLS), and we leverage distributional results associated with OLS estimation to compute, for each predicted change in TF binding, a P-value reflecting our confidence in the predicted effect. We show that OLS models are accurate in predicting the effects of mutations on TF binding in vitro and in vivo, outperforming widely-used PWM models as well as recently developed deep learning models of specificity. QBiC-Pred takes as input mutation datasets in several formats, and it allows post-processing of the results through a user-friendly web interface. QBiC-Pred is freely available at http://qbic.genome.duke.edu.

2018

Toward deciphering the mechanistic role of variations in the Rep1 repeat site in the transcription regulation of SNCA gene
Afek A., Tagliafierro L., Glenn O., Lukatsky D., Gordan R. & Chiba-Falek O. (2018) Neurogenetics. 19, 3, p. 135-144 Abstract
Short structural variantsvariants other than single nucleotide polymorphismsare hypothesized to contribute to many complex diseases, possibly by modulating gene expression. However, the molecular mechanisms by which noncoding short structural variants exert their effects on gene regulation have not been discovered. Here, we study simple sequence repeats (SSRs), a common class of short structural variants. Previously, we showed that repetitive sequences can directly influence the binding of transcription factors to their proximate recognition sites, a mechanism we termed non-consensus binding. In this study, we focus on the SSR termed Rep1, which was associated with Parkinsons disease (PD) and has been implicated in the cis-regulation of the PD-risk SNCA gene. We show that Rep1 acts via the non-consensus binding mechanism to affect the binding of transcription factors from the GATA and ELK families to their specific sites located right next to the Rep1 repeat. Next, we performed an expression analysis to further our understanding regarding the GATA and ELK family members that are potentially relevant for SNCA transcriptional regulation in health and disease. Our analysis indicates a potential role for GATA2, consistent with previous reports. Our study proposes non-consensus transcription factor binding as a potential mechanism through which noncoding repeat variants could exert their pathogenic effects by regulating gene expression.
DNA Sequence Context Controls the Binding and Processivity of the T7 DNA Primase
Afek A., Ilic S., Horton J., Lukatsky D. B., Gordan R. & Akabayov B. (2018) iScience. 2, p. 141-147 Abstract
Primases are key enzymes involved in DNA replication. They act on single-stranded DNA and catalyze the synthesis of short RNA primers used by DNA polymerases. Here, we investigate the DNA binding and activity of the bacteriophage T7 primase using a new workflow called high-throughput primase profiling (HTPP). Using a unique combination of high-throughput binding assays and biochemical analyses, HTPP reveals a complex landscape of binding specificity and functional activity for the T7 primase, determined by sequences flanking the primase recognition site. We identified specific features, such as G/T-rich flanks, which increase primase-DNA binding up to 10-fold and, surprisingly, also increase the length of newly formed RNA (up to 3-fold). To our knowledge, variability in primer length has not been reported for this primase. We expect that applying HTPP to additional enzymes will reveal new insights into the effects of DNA sequence composition on the DNA recognition and functional activity of primases.

2016

Control of transcriptional pausing by biased thermal fluctuations on repetitive genomic sequences
Imashimizu M., Afek A., Takahashi H., Lubkowska L. & Lukatsky D. B. (2016) Proceedings of the National Academy of Sciences of the United States of America. 113, 47, p. E7409-E7417 Abstract
In the process of transcription elongation, RNA polymerase (RNAP) pauses at highly nonrandom positions across genomic DNA, broadly regulating transcription; however, molecular mechanisms responsible for the recognition of such pausing positions remain poorly understood. Here, using a combination of statistical mechanical modeling and high-throughput sequencing and biochemical data, we evaluate the effect of thermal fluctuations on the regulation of RNAP pausing. We demonstrate that diffusive backtracking of RNAP, which is biased by repetitive DNA sequence elements, causes transcriptional pausing. This effect stems from the increased microscopic heterogeneity of an elongation complex, and thus is entropydominated. This report shows a linkage between repetitive sequence elements encoded in the genome and regulation of RNAP pausing driven by thermal fluctuations.

2015

Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes
Afek A., Cohen H., Barber-Zucker S., Gordân R. & Lukatsky D. B. (2015) PLoS Computational Biology. 11, 8, e1004429. Abstract
Recent genome-wide experiments in different eukaryotic genomes provide an unprecedented view of transcription factor (TF) binding locations and of nucleosome occupancy. These experiments revealed that a large fraction of TF binding events occur in regions where only a small number of specific TF binding sites (TFBSs) have been detected. Furthermore, in vitro protein-DNA binding measurements performed for hundreds of TFs indicate that TFs are bound with wide range of affinities to different DNA sequences that lack known consensus motifs. These observations have thus challenged the classical picture of specific protein-DNA binding and strongly suggest the existence of additional recognition mechanisms that affect protein-DNA binding preferences. We have previously demonstrated that repetitive DNA sequence elements characterized by certain symmetries statistically affect protein-DNA binding preferences. We call this binding mechanism nonconsensus protein-DNA binding in order to emphasize the point that specific consensus TFBSs do not contribute to this effect. In this paper, using the simple statistical mechanics model developed previously, we calculate the nonconsensus protein-DNA binding free energy for the entire C. elegans and D. melanogaster genomes. Using the available chromatin immunoprecipitation followed by sequencing (ChIP-seq) results on TF-DNA binding preferences for ~100 TFs, we show that DNA sequences characterized by low predicted free energy of nonconsensus binding have statistically higher experimental TF occupancy and lower nucleosome occupancy than sequences characterized by high free energy of nonconsensus binding. This is in agreement with our previous analysis performed for the yeast genome. We suggest therefore that nonconsensus protein-DNA binding assists the formation of nucleosome-free regions, as TFs outcompete nucleosomes at genomic locations with enhanced nonconsensus binding. In addition, here we perform a new, large-scale analysis using in vitro TF-DNA preferences obtained from the universal protein binding microarrays (PBM) for ~90 eukaryotic TFs belonging to 22 different DNA-binding domain types. As a result of this new analysis, we conclude that nonconsensus protein-DNA binding is a widespread phenomenon that significantly affects protein-DNA binding preferences and need not require the presence of consensus (specific) TFBSs in order to achieve genome-wide TF-DNA binding specificity.

2014

Protein-DNA binding in the absence of specific base-pair recognition
Afek A., Schipper J. L., Horton J., Gordân R. & Lukatsky D. B. (2014) Proceedings of the National Academy of Sciences of the United States of America. 111, 48, p. 17140-17145 Abstract
Until now, it has been reasonably assumed that specific base-pair recognition is the only mechanism controlling the specificity of transcription factor (TF)-DNA binding. Contrary to this assumption, here we show that nonspecific DNA sequences possessing certain repeat symmetries, when present outside of specific TF binding sites (TFBSs), statistically control TF -DNA binding preferences. We used highthroughput protein-DNAbinding assays to measure the binding levels and free energies of binding for several humanTFs to tens of thousands of short DNA sequences with varying repeat symmetries. Based on statisticalmechanicsmodeling, weidentifyanewprotein-DNAbinding mechanism induced by DNA sequence symmetry in the absence of specific base-pair recognition, and experimentally demonstrate that this mechanism indeed governs protein-DNA binding preferences.

2013

Positive and negative design for nonconsensus protein-DNA binding affinity in the vicinity of functional binding sites
Afek A. & Lukatsky D. (2013) Biophysical Journal. 105, 7, p. 1653-1660 Abstract
Recent experiments provide an unprecedented view of protein-DNA binding in yeast and human genomes at single-nucleotide resolution. These measurements, performed over large cell populations, show quite generally that sequence-specific transcription regulators with well-defined protein-DNA consensus motifs bind only a fraction among all consensus motifs present in the genome. Alternatively, proteins in vivo often bind DNA regions lacking known consensus sequences. The rules determining whether a consensus motif is functional remain incompletely understood. Here we predict that genomic background surrounding specific protein-DNA binding motifs statistically modulates the binding of sequence-specific transcription regulators to these motifs. In particular, we show that nonconsensus protein-DNA binding in yeast is statistically enhanced, on average, around functional Reb1 motifs that are bound as compared to nonfunctional Reb1 motifs that are unbound. The landscape of nonconsensus protein-DNA binding around functional CTCF motifs in human demonstrates a more complex behavior. In particular, human genomic regions characterized by the highest CTCF occupancy, show statistically reduced level of nonconsensus protein-DNA binding. Our findings suggest that nonconsensus protein-DNA binding is fine-tuned around functional binding sites using a variety of design strategies.
Genome-wide organization of eukaryotic preinitiation complex is influenced by nonconsensus protein-DNA binding
Afek A. & Lukatsky D. (2013) Biophysical Journal. 104, 5, p. 1107-1115 Abstract
Genome-wide binding preferences of the key components of eukaryotic preinitiation complex (PIC) have been recently measured at high resolution in Saccharomyces cerevisiae by Rhee and Pugh. However, the rules determining the PIC binding specificity remain poorly understood. In this study, we show that nonconsensus protein-DNA binding significantly influences PIC binding preferences. We estimate that such nonconsensus binding contributes statistically at least 2-3 kcal/mol (on average) of additional attractive free energy per protein per core-promoter region. The predicted attractive effect is particularly strong at repeated poly(dA:dT) and poly(dC:dG) tracts. Overall, the computed free-energy landscape of nonconsensus protein-DNA binding shows strong correlation with the measured genome-wide PIC occupancy. Remarkably, statistical PIC preferences of binding to both TFIID-dominated and SAGA-dominated genes correlate with the nonconsensus free-energy landscape, yet these two groups of genes are distinguishable based on the average free-energy profiles. We suggest that the predicted nonconsensus binding mechanism provides a genome-wide background for specific promoter elements, such as transcription-factor binding sites, TATA-like elements, and specific binding of the PIC components to nucleosomes. We also show that nonconsensus binding has genome-wide influence on transcriptional frequency.

2012

Nonspecific protein-DNA binding is widespread in the yeast genome
Afek A. & Lukatsky D. B. (2012) Biophysical Journal. 102, 8, p. 1881-1888 Abstract
Recent genome-wide measurements of binding preferences of ∼200 transcription regulators in the vicinity of transcription start sites in yeast, have provided a unique insight into the cis-regulatory code of a eukaryotic genome. Here, we show that nonspecific transcription factor (TF)-DNA binding significantly influences binding preferences of the majority of transcription regulators in promoter regions of the yeast genome. We show that promoters of SAGA-dominated and TFIID-dominated genes can be statistically distinguished based on the landscape of nonspecific protein-DNA binding free energy. In particular, we predict that promoters of SAGA-dominated genes possess wider regions of reduced free energy compared to promoters of TFIID-dominated genes. We also show that specific and nonspecific TF-DNA binding are functionally linked and cooperatively influence gene expression in yeast. Our results suggest that nonspecific TF-DNA binding is intrinsically encoded into the yeast genome, and it may play a more important role in transcriptional regulation than previously thought.

2011

Nonspecific transcription-factor-DNA binding influences nucleosome occupancy in yeast
Afek A., Sela I., Musa-Lempel N. & Lukatsky D. B. (2011) Biophysical Journal. 101, 10, p. 2465-2475 Abstract
Quantitative understanding of the principles regulating nucleosome occupancy on a genome-wide level is a central issue in eukaryotic genomics. Here, we address this question using budding yeast, Saccharomyces cerevisiae, as a model organism. We perform a genome-wide computational analysis of the nonspecific transcription factor (TF)-DNA binding free-energy landscape and compare this landscape with experimentally determined nucleosome-binding preferences. We show that DNA regions with enhanced nonspecific TF-DNA binding are statistically significantly depleted of nucleosomes. We suggest therefore that the competition between TFs with histones for nonspecific binding to genomic sequences might be an important mechanism influencing nucleosome-binding preferences in vivo. We also predict that poly(dA:dT) and poly(dC:dG) tracts represent genomic elements with the strongest propensity for nonspecific TF-DNA binding, thus allowing TFs to outcompete nucleosomes at these elements. Our results suggest that nonspecific TF-DNA binding might provide a barrier for statistical positioning of nucleosomes throughout the yeast genome. We predict that the strength of this barrier increases with the concentration of DNA binding proteins in a cell. We discuss the connection of the proposed mechanism with the recently discovered pathway of active nucleosome reconstitution.
Sequence correlations shape protein promiscuity
Lukatsky D., Afek A. & Shakhnovich E. I. (2011) Journal of Chemical Physics. 135, 6, 065104. Abstract
We predict analytically that diagonal correlations of amino acid positions within protein sequences statistically enhance protein propensity for nonspecific binding. We use the term promiscuity to describe such nonspecific binding. Diagonal correlations represent statistically significant repeats of sequence patterns where amino acids of the same type are clustered together. The predicted effect is qualitatively robust with respect to the form of the microscopic interaction potentials and the average amino acid composition. Our analytical results provide an explanation for the enhanced diagonal correlations observed in hubs of eukaryotic organismal proteomes.
Multi-scale sequence correlations increase proteome structural disorder and promiscuity
Afek A., Shakhnovich E. I. & Lukatsky D. B. (2011) Journal of Molecular Biology. 409, 3, p. 439-449 Abstract
Numerous experiments demonstrate a high level of promiscuity and structural disorder in organismal proteomes. Here, we ask the question what makes a protein promiscuous, that is, prone to nonspecific interactions, and structurally disordered. We predict that multi-scale correlations of amino acid positions within protein sequences statistically enhance the propensity for promiscuous intra- and inter-protein binding. We show that sequence correlations between amino acids of the same type are statistically enhanced in structurally disordered proteins and in hubs of organismal proteomes. We also show that structurally disordered proteins possess a significantly higher degree of sequence order than structurally ordered proteins. We develop an analytical theory for this effect and predict the robustness of our conclusions with respect to the amino acid composition and the form of the microscopic potential between the interacting sequences. Our findings have implications for understanding molecular mechanisms of protein aggregation diseases induced by the extension of sequence repeats.