We introduce an energy function for contact maps of proteins. In addition to the standard term, that takes into account pairwise interactions between amino acids, our potential contains a new hydrophobic energy term. Parameters of the energy function were obtained from a statistical analysis of the contact maps of known structures. The quality of our energy function was tested extensively in a variety of ways. In particular, fold recognition experiments revealed that for a fixed sequence the native map is identified correctly in an overwhelming majority of the cases tested. The successfully identified proteins include some cases that are known to pose difficulties for such tests (BPTI, spectrin and cro-protein). In addition, many known pairs of homologous structures were corectly identified, even when the two sequences had relatively low sequence homology. In addition we introduced a dynamic Monte Carlo procedure in the space of contact maps. Topological and polymeric constraints were taken into account by restrictive dynamic rules. Various aspects of protein dynamics, including high-temperature melting and refolding, were simulated. Perspectives of application of the energy function and the method for structure checking and fold prediction are discussed.
We present a new approach for clustering, based on the physical properties of an inhomogeneous ferromagnetic model. We do not assume any structure of the underlying distribution of the data. A Potts spin is assigned to each data point and short range interactions between neighboring points are introduced. Spin-spin correlations measured (by Monte Carlo procedure) in a super-paramagnetic regime in which aligned domains appear, serve to partition the data points into clusters. Oue method outperforms other algorithms for toy problems as well as for real data.
Simulations performed using a recently introduced deterministic topological model do not agree with some very recent results concerning the evolution of a single perturbed cluster. We analyze the source of the discrepancy and introduce a topological model that is in very good agreement with experiments and simulations available up to now.
We summarize some recent developments in approximate descriptions of soap froth evolution in 2d. The questions addressed concern temporal correlations in the scaling state, characterization of transient behavior and evolution from very special initial states. We observed that for these delicate issues mean field theory fails; moreover, we found that widely accepted topological simulation methods also disagree significantly with experiments. We identified the source of these discrepancies in the manner the models select the topological rearrangement that follows a bubble's disappearance. Properly modified topological simulations do yield agreement with experiments. Our analysis allows identification of the manner in which different aspects of the microscopic dynamics affect the long time behavior of the system.
The dynamic behavior of cluster algorithms is analyzed in the classical mean field limit. Rigorous analytical results below $T_c$ establish that the dynamic exponent has the value $z_{sw}=1$ for the Swendsen - Wang algorithm and $z_{uw}=0$ for the Wolff algorithm. An efficient Monte Carlo implementation is introduced, adapted for using these algorithms for fully connected graphs. Extensive simulations both above and below $T_c$ demonstrate scaling and evaluate the finite-size scaling function by means of a rather impressive collapse of the data.
We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures it is completely ordered; all spins are aligned. At very high temperatures the system does not exhibit any ordering and in an intermediate regime clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spin-spin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method.
We present a general definition of damage spreading in a pair of models. Using this general framework, one can define damage spreading in an objective manner, that does not depend on the particular dynamic procedure that is being used. The formalism can be used for any spin-model or cellular automaton, with sequential or parallel update rules. At this point we present its application to the Domany-Kinzel cellular automaton in one dimension, this being the simplest model in which damage spreading has been found and studied extensively. We show that the active phase of this model consists of three sub-phases, characterized by different damage-spreading properties.
We present two new results regarding damage spreading in ferromagnetic Ising models. First, we show that a damage spreading transition can occur in an Ising chain that evolves in contact with a thermal reservoir. Damage heals at low temperature and spreads at high $T$. The dynamic rules for the system's evolution for which such a transition is observed are as legitimate as the conventional rules (Glauber, Metropolis, heat bath). Our second result is that such transitions are not always in the directed percolation universality class.
We investigate the long time behavior of the survivors' area in the scaling state of two dimensional soap froth. We relate this problem to the recently studied temporal decay of the fraction of Potts spins that have never been flipped till time $t$. The results of our topological simulations are consistent with the value $\theta=1$ for the scaling exponent of the survivors' areas, in agreement with a recently obtained analytical result. We find, however, that the relaxation time needed to get into the scaling regime depends on the degree of randomness in the topological rearrangements and becomes very large in the deterministic limit.
We present an efficient algorithm to recover the three dimensional structure of a protein from its contact map representation. First we show that when a physically realizable map is used as target, our method generates a structure whose contact map is essentially similar to the target. Furthermore, the reconstructed and original structures are similar up to the resolution of the contact map representation. Next we use non-physical target maps, obtained by corrupting a physical one; in this case our method essentially recovers the underlying physical map and structure. Hence our algorithm will help to fold proteins, using dynamics in the space of contact maps. Finally we investigate the manner in which the quality of the recovered structure degrades when the number of contacts is reduced.
Ostwald ripening is the last stage of the evolution of a system with two coexisting phases. It is a relatively simple non-equilibrium phenomenon with several interesting features. For example, as the system coarsens it goes through a scaling state, one wich looks the same (up to an overall length scale, which grows) at all times. The dynamics of the problem can be mapped, in two dimensions, onto an evolving Coulomb system. In this work we present a brief summary of a novel theoretical approach to this problem, based on an analytic derivation (using a mean field approach) of an effective two-body interaction between droplets of the minority phase. The resulting interacting many-body dynamics is solved by a very efficient numerical algorithm, allowing us to follow the evolution of more than $10^6$ droplets on a simple workstation. The results are in excellent agreement with recent experiments.
We present a systematic quasi-mean field model of the Ostwald ripening process in two dimensions. Our approach yields a set of dynamic equations for the temporal evolution of the minority phase droplets' radii. The equations contain only pairwise interactions between the droplets; these interactions are evaluated in a mean- field type manner. We proceed to solve numerically the dynamic equations for systems of tens of thousands of interacting droplets. The numerical results are compared with the experimental data obtained by Krichevsky and Stavans for the relatively large volume fraction $\varphi=0.13$. We found good agreement with experiment even for various correlation functions.
The physical aspects of a recently introduced method for data clustering are considered in detail. This method is based on an inhomogeneous Potts model; no assumption concerning the underlying distribution of the data is made. A Potts spin is assigned to each data point and short range interactions between neighboring points are introduced. Spin-spin correlations, (measured by Monte Carlo) serve to partition the data points into clusters. In this paper we examine the effects of varying different details of the method such as the definition of neighbors, the type of interaction and the number of Potts states $q$. In addition, we present and solve a granular mean field Potts model relevant to the clustering method. The model consists of strongly coupled groups of spins coupled to noise spins which are themselves weakly coupled. The phase diagram is computed by solving analytically the model in various limits. Our main result is that in the range of parameters of interest the existence of the super-paramagnetic phase is independent of the ordering process of the noise spins. Next we use the known properties of regular and inhomogeneous Potts models in finite dimensions to discuss the performance of the clustering method. In particular, the spatial resolution of the clustering method is argued to be connected to the correlation length of spin fluctuations. The behavior of the method, as more and more data points are sampled, is also investigated.
We demonstrate that pairwise contact potentials alone cannot be used to predict the native fold of a protein. Ideally one would hope that a {\it universal} energy function exists, for which the native folds of {\it all} proteins are the respective ground states. Here we pose a much more restricted question: Is it possible to find a set of contact parameters for which the energy of the native contact map of a {\it single protein} (crambin) is lower than that of all possible physically realizable decoy maps? The set of maps we used were derived by energy minimization (not by threading). We seek such a set of parameters by {\it perceptron learning}, a procedure which is guaranteed to find such a set if it exists. We found that it is impossible to fine tune contact parameters that will assign all alternative conformations higher energy than that of the native map. This finding proves that there is no pairwise contact potential that can be used to fold any given protein. Inclusion of additional energy terms, such as hydrophobic (solvation), hydrogen bond or multi-body interactions may help to attain foldability within specific structural families.
We evaluate by Monte Carlo simulations various singular thermodynamic quantities $X$, for an ensemble of quenched random Ising and Ashkin-Teller models. The measurements are taken at $T_c$ and we study how the distributions $P(X)$ (and, in particular, their relative squared width, $R_X$) over the ensemble depend on the system size $l$. The Ashkin-Teller model was studied in the regime where bond randomness is irrelevant and we found weak self averaging; $R_X\sim l^{\frac{\alpha}{\nu}} \rightarrow 0$, where $\alpha<0$ and $\nu$ are the exponents (of the pure model fixed point) governing the transition. For the site dilute Ising model on a cubic lattice, known to be governed by a random fixed point, we find that $R_X$ tends to a universal constant independent of the amount of dilution (no self averaging). However this constant is different for canonical and grand canonical disorder. We identify the pseudo-critical temperatures $T_c(i,l)$ of each sample $i$, as the temperature at which the susceptibility reaches its maximal value. The distribution of these $T_c(i,l)$ over the ensemble was investigated; we found that its variance scales as $\left(\delta T_c(l)\right)^2 \sim l^{-\frac{2}{\nu}}$. These results are in agreement with the recent predictions of Aharony and Harris. Our previously proposed finite size scaling ansatz for disordered systems was tested and found to hold. When we fit the data obtained for many samples of different sizes by a sample-independent form, the resulting scaling function was found to be universal and to behave similarly to pure systems. We did observe that to describe deviations from this universal function we do need sample-dependent scaling functions. These deviations are, however, relatively small and this led us to an interesting side result: sample-to-sample fluctuations of $\chi^{max}$, the susceptibility measured at $T_c(i,l)$, %the maximum for each sample, are smaller by a factor of 70 than those of $\chi(T_c)$. This indicates that to obtain a fixed statistical error it may be more computationally efficient to measure $\chi^{max}$.
We simulated site dilute Ising models in $d=3$ dimensions for several lattice sizes $L$. For each $L$ singular thermodynamic quantities $X$ were measured at criticality and their distributions $P(X)$ were determined, for ensembles of several thousand random samples. For $L \rightarrow \infty$ the width of $P(X)$ tends to a universal constant, i.e. there is no self averaging. The width of the distribution of the sample dependent pseudocritical temperatures $T_c(i,L)$ scales as $\delta T_c(L) \sim L^{-\frac{1}{\nu}}$ and {\em not} as $\sim L^{-\frac{d}{2}}$. Finite size scaling holds; the sample dependence of $X_i(T_c)$ enters predominantly through $T_c(i,L)$.
Damage spreading for Ising cluster dynamics is investigated numerically by using random numbers in a way that conforms with the notion of submitting the two evolving replicas to the same thermal noise. Two damage spreading transitions are found; damage does not spread either at low or high temperatures. We determine some critical exponents at the high-temperature transition point, which seem consistent with directed percolation.
Background: Two problems are of major importance in protein fold prediction: how to generate plausible conformations, and how to choose an energy function to identify the native state. Contact maps are a simple representation of protein structure, one which offers a promising framework to address these two issues. Results: In this work we develop a Monte Carlo dynamics in contact map space. The procedure is divided into four steps: 1) Non-local dynamics, in which large scale ``cluster'' moves are performed. Clusters are in approximate correspondence with secondary structure elements. 2) Local dynamics, in which secondary structure location is optimized. 3) Reconstruction, in which the physicality of the contact map is restored,and 4) Refinement, which consist of a further Monte Carlo energy minimization in real space. We demonstrate that such a dynamical procedure is effective in producing uncorrelated low energy states. Conclusion: The procedure introduced in this paper generates very effectively a representative ensemble of conformations. We are able to show that existing sets of pairwise contact energy parameters are not suitable to single out the native state within this ensemble. The remaining outstanding issue in protein folding is to find an energy function which can discriminate the native state from decoys.
Changing a few contacts in a contact map corresponds to a large scale move in conformation space; hence one gains a lot by using the contact map representation for protein folding. We developed an efficient search procedure in the space of physical contact maps, which could identify the native fold as of the lowest free energy, provided one had a free energy function whose ground state is the native map. We prove rigorously that the widely used pairwise contact approximation to the free energy cannot stabilize even a single protein's native map. Testing the native map against a set of decoys obtained by gapless threading, one may be misled to the opposite conclusion.
A contact map is a simple representation of the structure of proteins and other chain-like macromolecules. This representation is quite amenable to numerical studies of folding. We show that the number of contact maps corresponding to the possible configurations of a polypeptide chain of $N$ amino acids, represented by $(N-1)$-step self avoiding walks on a lattice, grows exponentially with $N$ for all dimensions $D > 1$. We carry out exact enumerations in $D=2$ on the square and triangular lattices for walks of up to 20 steps and investigate various statistical properties of contact maps corresponding to such walks.. We also study the exact statistics of contact maps generated by walks on a ladder.
We present a method to derive contact energy parameters from large sets of proteins. The basic requirement on which our method is based is that for each protein in the database the native contact map has lower energy than all its decoy conformations that are obtained by threading. Only when this condition is satisfied can one use the proposed energy function for fold identification. Such a set of parameters can be found (by perceptron learning) provided $M_p$, the number of proteins in the database, is not too large. Other aspects that influence the existence of such a solution are the exact definition of contact and the value of the critical distance $R_c$, below which two residues are considered to be in contact. An additional, very important novel feature of our approach is its ability to determine whether an energy function of some suitable proposed form can or cannot be parametrized in a way that satisfies our basic requirement. As a demonstartion of this, we determine the region in the ($R_c,M_p$) plane in which the problem is solvable, i.e. we can find a set of contact parameters that stabilize simultaneously all the native conformations. We show that for large enough databases the contact approximation to the energy cannot stabilize all the native folds even against the decoys obtained by gapless threading.
Computational approaches to protein folding are divided into two main categories. In {\it energy minimization} methods the native state is identified with the the ground state of a suitable energy function (Brooks, 1998). In {\it fold recognition} methods, the native state is selected as the most compatible structure among those present in a library (Bowie {\em et al.}, 1992; Jones {\em et al.}, 1992; Fisher {\em et al.}, 1996). Both approaches depend on three important choices: 1) The representation of protein structure; 2) The set of alternative structures among which the native fold is sought for; 3) A bias towards the ``best'' conformation. In energy minimization approaches such a bias is an energy function, and the best conformation is the one of lowest energy. On the other hand, in fold recognition methods, a compatibility function for a sequence on a structure is used. The compatibility is often expressed in terms of database-derived properties and restraints. In this review we analyze the attempt to use contact maps to efficiently perform protein fold prediction. Contact maps are a particularly manageable representation of protein structure which has been already applied in the past to the study of protein conformation (Chan and Dill, 1990), structure comparison (Holm and Sander, 1993), interaction patterns in proteins (Lifson and Sander, 1979; Godzik {\em et al.}, 1993) and correlated mutations (Olmea and Valencia, 1997; Ortiz {\em et al.}, 1998). The possibility of performing energy minimization in the space of contact maps has been proposed by Mirny and Domany (1996). We present here the consistent development of their idea, discussing successes, failures and perspectives.
We studied the possibility to approximate a Lennard Jones interaction by a pairwise contact potential. First we used a Lennard-Jones potential to design off-lattice, protein-like heteropolymer sequences, whose lowest energy (native) conformations were then identified by Molecular Dynamics. Then we turned to investigate whether one can find a pairwise {\it contact} potential, whose ground states are the contact maps associated with these native conformations. We show that such a requirement cannot be satisfied exactly - i.e. no such contact parameters exist. Nevertheless, we found that one can find contact energy parameters for which an energy minimization procedure, acting in the space of contact maps, yields maps whose corresponding structures are close to the native ones. Finally we show that when these structures are used as the initial point of a Molecular Dynamics energy minimization process, the correct native folds are recovered with high probability.
This work addresses the data--fusion paradigm of multiple targets detected by multiple sensors in the presence of uncertainty. The integrating--information method is developed in a Bayesian framework which allows to embed the data fusion requirements naturally. The merging task is formulated as the minimization of the energy function of a Potts--spin glass where the interactions are given by the degree of consistency of two measurements performed by different sensors. The estimates of the probabilities on which the interactions rely are provided by of a feed--forward neural network. The information is then combined by a self--organizing system which produces a world image by resolving inconsistencies and integrates prior knowledge, such as geometrical constraints and world assumptions. The method is developed for a generic application, but a potential use, based on a specific problem posed by the aircraft industry is presented. This example allows to identify the main features of the method and present a detailed analysis of its implementation as well as compare its performance with other possible approaches.
Currently, each week about 50 new protein structures are made available in public databases. The attention is focused on developing automatic methods of classification. The work of organization is being done by several groups, to a large extent independently. To our knowledge, the consistency of different classifications has never been examined on a protein by protein basis. Moreover, the potential advantages of a simultaneous consideration of different classification schemes has not been much scrutinized. \\ \\ {\bf Results:} In this work we compare the classifications given by two widely used databases, FSSP and CATH. The pairwise structural similarities provided by the FSSP are found to be largely consistent with the CATH classification and we discuss five proteins for which the two databases are not consistent. We present a fully automated scheme to predict the CATH architecture of proteins on the basis of their FSSP scores. Using this scheme, we predict the architecture classifications of 165 single--domain proteins that were not yet processed by CATH when this work was done. For any protein to be classified we can estimate the expected success rate of our architecture predictions, which reaches 94\% for 50\% of the proteins and 73\% for an additional 35\%. \\ \\ {\bf Conclusions:} We found that correlating the information in the different protein structure databases can help automating classifications that are performed manually. We showed that automatic algorithms can be used to enlarge the CATH database using information available in the FSSP database. The identification of inconsistencies between databases can uncover possible errors of classification and may reveal shortcomings of the present methods.