BIOINFORMATICS<-->STRUCTURE
Jerusalem, Israel, November 17-21, 1996

Abstract


Application of evolutionary algorithms to protein folding prediction

A. Piccolboni and G. Mauri

piccolbo@dotto.usr.dsi.unimi.it


The prediction of tertiary structure form primary structure is a formidable as well as a fundamental task in molecular biology. The problem can be seen as the optimization of the energy function of a protein, under the assumption that an accurate enough approximation of this function is available. Indeed, according to the so-called Anfinsen hypothesis [1], native conformations correspond, at a first approximation, to global minima of this function.
Evolutionary algorithms (EAs) [6, 7, 8, 2] can be seen optimization methods based on an evolutionary metaphor that showed effective in solving difficult problems. Distinctive features of EAs are:

o a set of candidate solutions is considered at each time step instead of a single one (population)

o candidate solutions are combined to form new ones (mating operator)

o solutions can be randomly slightly modified (mutation operator)

According to [5], the best performances can be obtained when hyperplanes in the solution space with under average energy exist (building blocks). It is quite natural to think of secondary structure elements as candidates to act as building blocks, but a suitable representation has to be devised to let the algorithm exploit these building blocks. All evolutionary approaches to folding prediction so far [13, 10, 11, 3, 9] are based on an internal coordinate representation. It is straightforward to show that relevant structural features can not be described as hyperplanes under this approach.
We describe a different approach to folding prediction with EAs. The solution space is the set of peptide distance matrices. Distance matrices can represent all relevant structures in a concise and transparent way, so that there's a one to one mapping between structures of interest and low dimensional hyperplanes in matrix space. Besides distances, two dihedral angles per peptide are allowed to vary, in order to obtain a more accurate modeling of the backbone. It should be noted that this representation is rich enough to describe tertiary structure but doesn't provide important features such as side chain placement. As an energy function, an improved version of the potential described in [4] was used. This function has four main terms:

o single residue energy accounts for intra-peptide interactions;

o H-bonding term is the result of the search for an optimal pattern for H-bon ds;

o non-bonded inter-residue interactions term;

o disulfide bonds term;

This potential is of course a rough approximation, but has been shown to reproduce the most important secondary structures and is very fast to evaluate. Moreover, it depends only on the backbone conformation and peptide orientations,which is exactly the information available in the representation used by the EA.

The EA can be further specified with the definition of genetic operators. The recombination operators are two. One is the uniform crossover, i.e. the variables defining the new individual are equal to corresponding variables of one or the other parent in a random independent fashion. The resulting matrix is not guaranteed to be embeddable (i.e. to correspond to a set of points), so a repair algorithm is necessary (see [12]). The second operator is a convex combination of elementwise squared matrices. This operator guarantees embeddability in a higher dimensional space [12]. The mutation operator is a classic creep operator that randomly choices a variable and modifies it with the addition of a small gaussianly distributed amount.

Preliminary results obtained by running the algorithm on a few small structures will be discussed.

[1]C. B. Anfinsen, E. Haber, M. Sela, and F. H. White Jr. The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. In Proceedings of the National Academy of Sciences of the U. S. A. , volume 47, pages 1309-1314, 1961.
[2]Th. B"ack and Hans-Paul Schwefel. An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation,, 1(1):1-23, 1993.
[3]T. Dandekar and P. Argos. Folding the main chain of small proteins with the genetic algorithm. Journal of Molecular Biology, 236:844-861, 1994.
[4]Paul R. Gerber. A reduced force field for protein folding studies. In P. K. Weiner, W. F. Gunsteren and A. J. Wilkinson, editors, Computer Simulation of Biomolecular Systems, volume 2, Leiden, 1993. ESCOM.
[5]D. E. Goldberg, editor. Genetic Algorithm in search, optimization and machine learning. Addison-Wesley, 1990.
[6]John H. Holland, editor. Adaptation in Natural and Artifcial Systems. MIT Press, 1975.
[7]Kenneth De Jong. An analysis of the behaviour of a class of genetic adaptive systems. PhD thesis, University of Michigan, 1975.
[8]A. J. Owens, Lawrence J. Fogel and M.J. Walsh. Artificial intelligence through simulated evolution. Wiley, New York, 1966.
[9]A. Patton, W. Punch, and E. Goodman. A standard ga approach to native protein con formation prediction. In L. Eshelman, editor, Proc. Sixth Int. Conf. Gen. Algo., volume 574. Morgan Kaufmann, 1995.
[10]Steffen Schulze-Kremer. Genetic algorithms for protein tertiary structure prediction. In P. B. Brazdil, editor, Machine Learning: ECML-93. Springer-Verlag, 1993.
[11]Steffen Schulze-Kremer. Genetic algorithms and protein folding. http://www.techfak.unibielefeld.de/bcd/Curric/ProtEn/contents.html, June 1995.
[12]T. F. Havel, I. D. Kuntz, and G. M. Crippen. The theory and practice of distance geometry. Bull. math. Bio, (182):281-294, 1983.
[13]R. Unger and J. Moult. Genetic algorithms for protein folding simulations. Journal of Molecular Biology, 231:75-81, 1993.


Back to the Abstract Index.