BIOINFORMATICS<-->STRUCTURE
Jerusalem, Israel, November 17-21, 1996

Abstract


The bank of patterns PROF_PAT 1.0: construction procedure and computer programs

Bachinsky A.G., Yarigin A.A. and Kulichkov V.A.

Theoretical Department, Research Institute of Molecular Biology, SRC VB 'Vector', Koltsovo, Novosibirsk region, 633159, Russia

bachin@vector.nsk.su


Up to now the main method of suggesting possible functions of the newly deciphered amino acid sequences (AAS) has been to search them for homology with sequences available in protein banks such as PIR, SWISS-PROT and others. As these banks grow larger, such comparison becomes more promising but at the same time more time-consuming. A number of works appeared in the last few years, aimed at the selection of sites in groups of related proteins, which are representative of a protein family as a whole, and at their use both to identify new proteins and to refine structural and functional properties of those already known. Such issued by EMBL databases as PROSITE, BLOCKS, PRINTS are among the most well-known. There is also a number of other similar databases.

The version of the patterns bank we present now, constructed on the basis of the 29th release of the SWISS-PROT bank, contains patterns of 2384 groups of related proteins in a format similar to that of the PROSITE bank. Elements of patterns were selected, which had the minimum grade of probability to be found in random sequences.

The researcher can specify a set of protein families, in which a local similarity is searched with a sequence of interest. A matrix of similarity can be also chosen,(the type PAM, BLOSUM and other). The level of similarity (from exact matching up to rather distant homology) can be specified too.

Patterns identify more than 23.5 thousand of AAS as having shown "positive" or "conditionally-positive" similarity. In the latter case the homologous sequences, not admitted into the training samples, are usually identified.

Only 35 unrelated proteins of the prototype bank were recognized by two or more elements of patterns. In certain cases it is possible to find out that the reason of the discovered similarity lies in the homology of the AAS to the proteins of the training sample, which is sometimes reflected in the fields FT and CC of the SWISS-PROT bank. Sometimes it is not possible to find information of such kind, and we consider that a similarity so far unknown is discovered.


Back to the Abstract Index.