Supplementary material for

Proteomic Signatures: Amino Acid and Oligopeptide Compositions Differentiate Among Phyla

Itsik Pe'er, Clifford E. Felder, Orna Man, Israel Silman, Joel L. Sussman, and Jacques S. Beckmann

Proteins: Structure, Function and Genetics 54 , 20-40 (2004)

All the data text files are best viewed when opened with MS Excel. Page numbers refer to the journal page numbers.

 

  • Repeating figures of clustering by other training sets, referred to on page 23 (tif images):
    • Separation between eukaryotes & eubacteria (repeating Figure 2a), by training sets 1 2
    • Separation between eukaryotes & archaea (repeating Figure 2c), by training sets 1 2
    • Separation between eukaryotes, eubacteria, and archaea with training on eykaryotes and archaea only (repeating Figure 2c), by training sets 1 2

 

  • Z-scores of XY vs YX heterodipeptide bias, referred to on page 24. A text file with a symmetric table of 20x20 space delimited real numbers. The number on row X, column Y denotes the Z-score (number of standard deviations) by which XY occurs more than YX.

 

  • Oligopeptide frequency deviations from expectation, referred to on Page 24. 2-column tab delimited text files, the first column listing the oligopeptide, the second listing its Z-score (number of standard deviations)

Pyrococcus furiosus

Dipeptides

Tripeptides

Homo Sapiens

Dipeptides

Tripeptides

Escherichia coli K12

Dipeptides

Tripeptides

Average archaea

Dipeptides

Tripeptides

Average eubacteria

Dipeptides

Tripeptides

Average eukaryota

Dipeptides

Tripeptides

 

  • Homotripeptides in eukaryotes, referred to on page 24. Tab delimited lists of either expected, observed, and their normalized difference (z-score) for homotripeptide contents of each eukaryotic genome examined. The homotripeptides are listed in alphabetical order.

Anopheles gambiae

Expected

Observed

Z-score

Arabidopsis thaliana

Expected

Observed

Z-score

Caenorhabditis elegans

Expected

Observed

Z-score

Ciona intestinalis

Expected

Observed

Z-score

Drosophila melanogaster

Expected

Observed

Z-score

Homo sapiens

Expected

Observed

Z-score

Mus musculus

Expected

Observed

Z-score

Oryza sativa

Expected

Observed

Z-score

Rattus norvegicus

Expected

Observed

Z-score

Saccharomyces cerevisiae

Expected

Observed

Z-score

Schizosaccharomyces pombe

Expected

Observed

Z-score

Takifugu rubripes

Expected

Observed

Z-score

 

  • Rank order of residues and oligopeptides by phyla, referred to on pages 29, 34, and 36. Three 4/5-column, tab delimited text files, with rows corresponding to either single residues, dipeptides, or tripeptides, respectively, which are listed in the first column. The next column in oligopeptide files indicates whether the current oligopeptide is a homopeptide, heteropeptide or, in tripeptides, a palindrome. The following columns in all files detail the rank order of the current residue or oligopeptide in eukaryota, eubacteria or archaea, respectively. Each of the three files appears in 6 extract versions: top/bottom extracts according to each of the three superkingdoms.

Sorted by

Residues

Dipeptides

Tripeptides

Eukaryota

Top/Bottom

Top/Bottom

Top/Bottom

Eubacteria

Top/Bottom

Top/Bottom

Top/Bottom

Archaea

Top/Bottom

Top/Bottom

Top/Bottom

Residues

All

All

All

 

  • Standardized Euclidean distances between species' compositions, referred to on page 34. A text file with a symmetric table of 72x72 space delimited real numbers, with preceding row and column that detail speciesí 3-letter nickname (see Table I). The number on row X, column Y denotes the standardized Euclidean distance between normalized composition vectors for species X and Y. Each such vector is computed by dividing the frequency of each of the 20 residues in the current speciesí proteome by the variance of the frequency of this residue across all the proteomes examined.

 

 

  • Species coordinates in all figures, 3-column tab delimited text files, one per principle component analysis plot, the first column naming the species and the other two providing first and second component coordinates for this species in the current plot. Figures 2a 2b 2c 2d 2dí 2e 3a 3b 4 7a 7c 9a 9c

 

  • Exact oligopeptide score referred to on the Appendix. PDF document that describes the exact details of the score used, a score which properly distinguishes left and right ends of the sequence, and takes special care of them.