BIOINFORMATICS<-->STRUCTURE
Jerusalem, Israel, November 17-21, 1996

Abstract


Studying structure-activity and phenotype-genotype relationships in protein families. Methods, algorithms and applications

Vladimir A. Ivanisenko, Irina S. Pika, Sergey I. Pinin, Tatiana I. Fomina, Alexey M. Eroshkin

Research Institute of Molecular Biology, State Research Center of Virology and Biotechnology "Vector", Koltsovo, Novosibirsk region, 633159 Russia

eroshkin@vector.nsk.su


Protein sequence and structure analysis directed mostly to several traditional tasks: searching functional sites, prediction 2D and 3D structures, antigenic determinants, transmembrane segments, etc. Some additional tasks are: finding sites influencing protein activity; searching regions of differences in related proteins divided by functional, evolutionary or other criteria (studying phenotype-genotype correlations); finding factors responsible for activity changes in mutant proteins; predicting activity of a newly sequenced or mutant protein. The methods, algorithms and computer programs are developed to solve these tasks. Data for analysis are protein sequences and structures from Swiss-Prot Database and Protein Data Bank, supplemented with protein activity data. Activity-modulating center (AMS) is defined as some site in protein sequence or 3D structure whose amino acid changes modulate protein activity. The methods developed to find AMS are based on the analysis of relationships between activity of protein set and protein site (in 1D and 3D structure) physico-chemical characteristics. The methods include: multiple linear regression, discriminant and cluster analysis; analysis of variations (ANOVA); alphabetical and profile analysis.

The methods have been applied to analysis of proteins with tested activity (disintegrins, human alpha-interferons, antimicrobial peptides); phage-display data; M2 proteins from drug-sensitive and drug-resistant strains of influenza A virus (genotype- phenotype correlations); etc. It was found, that disintegrin s activity depends on the amino acid volume in position +2 from well known active RGD site. AMS of human interferons is spatial site consisting of amino acids from both N- and C- termini. Drug resistance in influenza A viruses was found to be well correlated with polarity of several amino acid positions in transmembrane helix. These and other obtained results agree with available data.

Obtained relationships are the basis for protein activity and related phenotype predictions. As example a set of human tumor necrosis factor (TNF) mutants was analyzed by means of linear regression. It was found that charge of amino acids in positions 31-32 correlate well (c=0.78) with mutants activities. Based on this correlation, the mutation R32 --> Q was suggested to obtain the analog with reduced cytotoxicity. This prognosis was confirmed in experiment. The programs implementing the described algorithms will be presented and discussed.


Back to the Abstract Index.