Data from "Cellular crowding imposes global constraints on the chemistry and evolution of proteomes"
Levy et al. PNAS 2012
- matrix containing the data from E. coli.
- matrix containing the data from S. cerevisiae.
- matrix containing the data from H. sapiens.
The three data matrices contain the data used in this work. In these matrices, each row corresponds to one amino acid, and columns contain the following information:
- ensID: Protein identifier (Uniprot for E. coli, SGD for Yeast, and Ensembl for Human).
- pdbID: pdb code of the structure and chain identifier
- pos.ens: position in the protein sequence (from the organism's proteome).
- pos.pdb: position in the protein sequence (from the pdb SEQRES field).
- aa: amino acid letter code
- rate: rate of evolution calculated with the rate4site software
- ndef: number of species defined at a given position
- aa.prop: stickiness score of the amino acid
- len: length of the protein
- ASA.rel.cplx: relative ASA of the amino-acid in the biological unit
- ASA.rel.alone: relative ASA of the amino-acid in the chain (separated from the biological unit)
- ab.all: abundance given in the pax-db database
- patch.compo.400abs: stickiness score of the 400A^2 patch surrounding the amino acid.
Data from "A simple definition of structural regions in proteins and its use in analysing interface evolution"
Levy, J. Mol. Biol. 2010
- matrix containing the data from E. coli.
- matrix containing the data from S. cerevisiae.
- matrix containing the data from H. sapiens.
The three data matrices contain the data used in this study. In these matrices, each row corresponds to one amino acid, and columns contain the following information:
- ensID: Protein identifier (Uniprot for E. coli, SGD for Yeast, and Ensembl for Human).
- pdbID: pdb code of the structure and chain identifier
- pos.ens: position in the protein sequence (from the organism's proteome).
- pos.pdb: position in the protein sequence (from the pdb SEQRES field).
- aa: amino acid letter
- ASA.rel.cplx: relative ASA measured in the complexed form (if the protein is involved in a complex).
- ASA.rel.alone: relative ASA measured in the monomeric form (chains are split)
- nsub: number of subunits in the structure containg the chain
- sym: symmetry of the complex
- len: length of the protein
- homo: 1 if the protein is a homo-oligomer, 0 otherwise. Note: 2 stands for complexes containing paralogues only.
- cat: category an amino acid is assigned to. 1: interior, 2: surface, 3: interface support, 4: interface core, 5: interface rim.
- patch.alone.size: number of amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the monomer)
- patch.cplx.size: number of amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the complex)
- patch.cplx: interface propensity of the amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the monomer)
- patch.alone: interface propensity of the amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the monomer)
Note: corresponding structures are available on the 3DComplex website.