Supplementary
material for
Proteomic
Signatures: Amino Acid and Oligopeptide Compositions Differentiate Among Phyla
Itsik Pe'er, Clifford E. Felder, Orna Man,
Israel Silman, Joel L. Sussman, and
Jacques S. Beckmann
Proteins: Structure, Function
and Genetics 54
,
20-40 (2004)
All the data text
files are best viewed when opened with MS Excel. Page numbers refer to the
journal page numbers.
- Repeating
figures of clustering by other training sets, referred to on page 23
(tif images):
- Separation
between eukaryotes & eubacteria (repeating Figure 2a), by training
sets 1
2
- Separation
between eukaryotes & archaea (repeating Figure 2c), by training sets 1
2
- Separation
between eukaryotes, eubacteria, and archaea with training on eykaryotes
and archaea only (repeating Figure 2c), by training sets 1
2
- Z-scores
of XY vs YX heterodipeptide bias, referred to on page 24.
A text
file with a symmetric table of 20x20 space delimited real numbers. The
number on row X, column Y denotes the Z-score (number of standard
deviations) by which XY occurs more than YX.
- Oligopeptide
frequency deviations from expectation, referred to on Page 24.
2-column tab delimited text files, the
first column listing the oligopeptide, the second listing its Z-score
(number of standard deviations)
- Homotripeptides
in eukaryotes, referred to on page 24. Tab delimited lists of either
expected, observed, and their normalized difference (z-score) for
homotripeptide contents of each eukaryotic genome examined. The
homotripeptides are listed in alphabetical order.
- Rank
order of residues and oligopeptides by phyla, referred to on pages 29,
34, and 36. Three 4/5-column, tab delimited text files, with rows
corresponding to either single residues, dipeptides, or tripeptides,
respectively, which are listed in the first column. The next column in
oligopeptide files indicates whether the current oligopeptide is a
homopeptide, heteropeptide or, in tripeptides, a palindrome. The following
columns in all files detail the rank order of
the current residue or oligopeptide in
eukaryota, eubacteria or archaea, respectively. Each of the three files
appears in 6 extract versions: top/bottom extracts according to each of
the three superkingdoms.
- Standardized Euclidean distances between species'
compositions, referred to on page 34.
A text
file with a symmetric table of 72x72 space delimited real numbers,
with preceding row and column that detail speciesí 3-letter nickname (see
Table I). The number on row X, column Y denotes the standardized Euclidean
distance between normalized composition vectors for species X and Y. Each
such vector is computed by dividing the frequency of each of the 20
residues in the current speciesí proteome by the variance of the frequency
of this residue across all the proteomes examined.
- Species
coordinates in all figures, 3-column tab delimited text files,
one per principle component analysis
plot, the first column naming the species and the other two providing
first and second component coordinates for this species in the current
plot. Figures 2a 2b 2c 2d 2dí 2e 3a 3b 4 7a 7c 9a 9c
- Exact
oligopeptide score
referred to on the Appendix.
PDF document
that describes the exact details of the score used, a score which properly
distinguishes left and right ends of the sequence, and takes special care
of them.