BIOINFORMATICS<-->STRUCTURE
Jerusalem, Israel, November 17-21, 1996

Abstract


Classification, fourier- and wavelet-transform of E.coli promoter and terminator sequences

G.I. Kutuzova (1), R.V. Polozov (2), V.Ju. Makeev (1), G.K. Frank (1), N.G. Esipova (1) and V.G. Tumanyan (1)

(1) Engelhardt Institute of Molecular Biology RAS, Moscow, Russia
(2) Institute of Theoretical and Experimental Biophysics RAS, Puschino, Moscow region, Russia


Classification and analysis of promoter and terminator nucleotide sequences is necessary for their recognition in genome and for understanding of transcription regulation mechanisms. In this work we have concentrated on primary structure analysis of a set of collected E.coli promoters (290 sequences) and terminators (128 sequences).

We have classified promoter and terminator sequences with the help of agglomerative hierarchical cluster analysis and Kohonen unsupervised learning neural networks algorithm. Matrix Fourier transform and Wavelet transform were applied for analysis of upstream E.coli promoter regions.

The focus of our investigation was on revealing sequence dependent characteristics that correlate with promoter and terminator function. RNA polymerase recognizes a specific configuration of the promoter that is dictated by the sequence but does not require complete conservation of the nucleotides. We tried to find additional sequence constraints that determine promoter strength.

We observed nonrandomness and periodicities in nucleotide, dinucleotide and trinucleotide distributions in E.coli upstream promoter sequences.

The Euclidean metrics has been used as a measure of the proximity of promoter and terminator sequences in terms of purine - pyrimidine asymmetry, in terms of relative frequency of dinucleotide occurrences and in terms of periodicities of nucleotides, dinucleotides, trinucleotides occurrences and in terms of distributions of mini-kinks (which occur predominantly in the pyrimidine-purine YR dimers (CA:TG, TA,CG) and in AG:CT steps).

Good clustering was observed if such sequence characteristics as purine-pyrimidine asymmetry, relative frequencies of dinucleotide occurrences or periodicities in nucleotide sequences instead of the sequences of E.coli promoters and terminators per se were studied.

The classification obtained with Kohonen neural networks method is very similar to the one obtained with cluster method.

On the basis of the methods applied the unequivocal classification has been obtained and the results have been interpreted in terms of the geometrical characteristics and such functionally significant properties of DNA, as: DNA curvature, bendability, superhelicity and in terms of such characteristics of promoter activity as the binding constants for RNA polymerase and the kinetics constants of open polymerase-promoter complex formation.


Back to the Abstract Index.