E. coli transcription network

 Transcriptional regulation networks in cells orchestrate gene expression. In this network the 'nodes' are operons, and each 'edge' is directed from an operon that encodes a transcription factor to an operon that it directly regulates (an operon is one or more genes transcribed on the same mRNA). We asked whether one can decompose such networks into basic building blocks. To accomplish this, we generalize the concept of motifs, widely used in analyzing sequences, to the level of networks. We define 'network motifs', patterns of interconnections that recur in many different parts of a network, at frequencies much higher than in randomized networks that preserve the number of incoming an outgoing edges for each node. We developed algorithms for detecting network motifs and applied them to one of the best-characterized regulation network, that of transcriptional interactions in Escherichia coli. We find that much of the network is composed of repeated appearances of three highly significant motifs. Each network motif has a specific function in determining gene expression, such as generating temporal expression programs and governing the responses to fluctuating external signals. The motifs also allow an easily interpretable view of the entire known transcriptional network of the organism. This work is available in pdf form. The transcriptional database contains 577 interactions between 116 TFs and 419 operons. It was based on an existing database (RegulonDB). We enhanced RegulonDB by an extensive literature search, adding 35 new TFs, including alternative sigma factors, and over a hundred new interactions from the literature. The dataset consists of established interactions in which a TF directly binds a regulatory site. This dataset is available in flat file form (version 1.0, as published in the paper). The latest version with several additional interactions and a few corrections is also available (version 1.1).