BIOINFORMATICS<-->STRUCTURE
Jerusalem, Israel, November 17-21, 1996

Abstract


Comparing sequence comparison

Steven E. Brenner (1), Tim J. P. Hubbard (2), Alexey G. Murzin (2), Cyrus Chothia (1)

(1) MRC Laboratory of Molecular Biology and
(2)Cambridge Centre for Protein Engineering Hills Road Cambridge CB2 2QH England

brenner@mole.bio.cam.ac.uk


Since the discovery that the fold of hemoglobin was similar to that of myoglobin, structure comparison has been the most powerful means of detecting distant evolutionary relationships. However, sequence comparison is far easier, faster, and can be applied to a hundred-fold more proteins. Until now, it has been impossible to adduce the power of sequence comparison methods, since virtually all homologies have been inferred from sequence comparison. The scop: Structural Classification of Proteins database contains a unique, complete description of structurally-identified evolutionary relationships, and has allowed us to rigorously test sequence comparison algorithms. Sequence comparison showed itself to be a highly effective and, with BLAST, a very efficient means of detecting relatively close evolutionary relationships. However, of roughly 4000 distant relationships, BLAST was able to find but one-tenth of structurally- identified homologues. Slower methods performed better, but the Smith-Waterman still identified merely 15% and FASTA with ktup=1 found slightly fewer. Importantly, the statistical scores from William Pearson's Smith-Waterman and FASTA proved highly accurate and superior to raw scores. BLAST P-values are also more reliable than bit scores, but seem to consistently exaggerate the degree of similarity. An understanding of these results allows sequence comparison to be used with new confidence but also with awareness of its limitations.


Back to the Abstract Index.