BIOINFORMATICS<-->STRUCTURE
Jerusalem, Israel, November 17-21, 1996

Abstract


The protein fold recognition based on the statistical hypothesis testing

Shamil Sunyaev, Eugene Kuznetsov and Vladimir Tumanyan

Lab. Of Computer And Structural Analysis Of Bioploymers, V. A.. Engelhardt Institute Of Molecular Biology, Moscow, 117984, RUSSIA

Shamil.Sunyaev@EMBL-Heidelberg.de


The problem of knowledge based protein fold recognition was formalized in the framework of the mathematical statistics; namely, as a problem of statistical hypothesis testing. Our general formulation leads to various mathematical forms of decision rule functions for evaluation of the quality of a sequence-structure fit. The approach is not based on the assumption of the Boltzmann distribution. Two decision rule functions are based on different log likelihood ratio estimations and two others are newly derived nonparametric tests. To test and compare these sequence-structure compatibility criteria we performed "structure seeks sequence" searches for highly populated protein structural families. Since various criteria reflect different underlying statistical propositions they sometimes recognize diverse correct sequence-structure matches.

Our statistical approach also involves the evaluation of various environmental variables used for the fold recognition. Few quality criteria for environmental variables were formulated and tested. Results of this analysis are important for theoretical basis of threading methods.


Back to the Abstract Index.