Supplementary material for
Proteomic Signatures: Amino Acid and Oligopeptide Compositions Differentiate Among Phyla
Itsik Pe'er, Clifford E. Felder, Orna Man, Israel Silman, Joel L. Sussman, and Jacques S. Beckmann
Proteins: Structure, Function and Genetics 54 , 2040 (2004)
All the data text files are best viewed when opened with MS Excel. Page numbers refer to the journal page numbers.

Repeating figures of clustering by other training sets, referred to on page 23 (tif images):
 Separation between eukaryotes & eubacteria (repeating Figure 2a), by training sets 1 2
 Separation between eukaryotes & archaea (repeating Figure 2c), by training sets 1 2
 Separation between eukaryotes, eubacteria, and archaea with training on eykaryotes and archaea only (repeating Figure 2c), by training sets 1 2
 Zscores of XY vs YX heterodipeptide bias, referred to on page 24. A text file with a symmetric table of 20x20 space delimited real numbers. The number on row X, column Y denotes the Zscore (number of standard deviations) by which XY occurs more than YX.
 Oligopeptide frequency deviations from expectation, referred to on Page 24. 2column tab delimited text files, the first column listing the oligopeptide, the second listing its Zscore (number of standard deviations)
Pyrococcus furiosus 

Homo Sapiens 

Escherichia coli K12 

Average archaea 

Average eubacteria 

Average eukaryota 
 Homotripeptides in eukaryotes, referred to on page 24. Tab delimited lists of either expected, observed, and their normalized difference (zscore) for homotripeptide contents of each eukaryotic genome examined. The homotripeptides are listed in alphabetical order.
Anopheles gambiae 

Arabidopsis thaliana 

Caenorhabditis elegans 

Ciona intestinalis 

Drosophila melanogaster 

Homo sapiens 

Mus musculus 

Oryza sativa 

Rattus norvegicus 

Saccharomyces cerevisiae 

Schizosaccharomyces pombe 

Takifugu rubripes 
 Rank order of residues and oligopeptides by phyla, referred to on pages 29, 34, and 36. Three 4/5column, tab delimited text files, with rows corresponding to either single residues, dipeptides, or tripeptides, respectively, which are listed in the first column. The next column in oligopeptide files indicates whether the current oligopeptide is a homopeptide, heteropeptide or, in tripeptides, a palindrome. The following columns in all files detail the rank order of the current residue or oligopeptide in eukaryota, eubacteria or archaea, respectively. Each of the three files appears in 6 extract versions: top/bottom extracts according to each of the three superkingdoms.
Sorted by 
Residues 
Dipeptides 
Tripeptides 
Eukaryota 

Eubacteria 

Archaea 

Residues 
Standardized Euclidean distances between species' compositions, referred to on page 34. A text file with a symmetric table of 72x72 space delimited real numbers, with preceding row and column that detail speciesí 3letter nickname (see Table I). The number on row X, column Y denotes the standardized Euclidean distance between normalized composition vectors for species X and Y. Each such vector is computed by dividing the frequency of each of the 20 residues in the current speciesí proteome by the variance of the frequency of this residue across all the proteomes examined.
 Similarity trees for dipeptides and tripeptides, referred to on page 34 (tif images).
 Species coordinates in all figures, 3column tab delimited text files, one per principle component analysis plot, the first column naming the species and the other two providing first and second component coordinates for this species in the current plot. Figures 2a 2b 2c 2d 2dí 2e 3a 3b 4 7a 7c 9a 9c
 Exact oligopeptide score referred to on the Appendix. PDF document that describes the exact details of the score used, a score which properly distinguishes left and right ends of the sequence, and takes special care of them.