Protein Data Bank
The World Wide Protein Data Bank (wwPDB) is the internationally recognized sole repository of all published, empirically-determined atomic resolution macromolecular three-dimensional (3D) structure data. Founded in 1971 by Drs. Edgar Meyer and Walter Hamilton at Brookhaven National Laboratory, management of the Protein Data Bank was headed by Tom Koestle until 1994 and then by Joel L. Sussman till 1999, when it was transferred to members of the Research Collaboratory for Structural Bioinformatics (RCSB). Rutgers University is the lead site and is currently under the direction of Helen M. Berman. In 2008, it has three official branches: the Research Collaboratory for Structural Bioinformatics (USA), the European Bioinformatics Institute (UK), and the Protein Data Bank Japan.
New Releases Cycle
The wwPDB releases new entries once per week. These can be seen by clicking on the most recent release date, shown at the upper right of the main page at PDB.Org. In 2007, 7,280 new entries were released (an average of 140/week). In 2011, 8,101 new entries were released (average 155/week).
While the traditional entry consisted of an atomic coordinate file molecular model, more recently, the experimental data (structure factors in the case of crystallography) have been deposited along with the the model. After February 1, 2008, deposition of experimental data is required along with all new entries.
Many derivative databases copy, derive information from, or add value to the atomic coordinate files available from the wwPDB. Often, these automatically update their databases weekly, shortly after the new releases become available at the PDB. Proteopedia is one example.
At pdb.org, at the upper right corner of the main page, click on PDB Statistics for a wealth of interesting information, including proteins solved by multiple experimental methods, sequence redundancy in the PDB, the distribution of resolutions, the 100 journals that have published the most new macromolecular structures, and graphs of the growth of the database (under Content Growth).
Some interesting statistics (maxima, minima, means) for the contents of the PDB are summarized at Believe It or Not.
Periodically, the PDB remediates its archived data files. Remediation improves consistency and nomenclature and corrects some errors. Remediation involves changes in the PDB data format. Remediations occurred in August, 2007 and March, 2009. Details will be found at the World Wide PDB.
Here are some examples of changes that occurred in remediations affecting the PDB format.
- DNA: Prior to August, 2007, both DNA and RNA nucleotides were named A, C, G, T, and U. After August, 2007, DNA nucleotides were changed to DA, DC, DG, DT and DU, while RNA nucleotides continued to use the older one-letter names. (An example of a model that contains both DNA and RNA is 104d.) This change required changes in software packages such as Jmol, and left unmaintained packages such as Protein Explorer unable to deal properly with the remediated nucleic acids.
- Non-standard residues: Some PDB files represented non-standard residues as a standard residue (ATOM records) plus an adduct (HETATM records). Some of these were changed to a uniform name for a non-standard residue, so that all atoms in the same residue have the same name (and all are HETATM records). For example, phosphoserine in 1apm was SER plus PHO; phosphothreonine THR plus PHO. These were remediated to SEP and TPO. In another example, methylated ribonucleotides in 310d had been named e.g. +C1 plus CH3. These were remediated to OMC and so forth.
- Order of atoms: In the March, 2009 remediation, the order of chains and atoms changed in some PDB files in a non-systematic manner. This broke some scenes that had been saved in Proteopedia, and required redesign of some portions of Proteopedia (see Proteopedia avoids remediation-related problems).
Obsolete (unremediated) versions of the data files were saved by the PDB before each remediation, and may be obtained: see Getting Unremediated PDB Files.
Sequence Numbering Anomalies
Entries in the PDB often contain anomalies in sequence numbering (see Homology_modeling_servers#Sequence_Numbering_Anomalies).
Improving Published Models
More About The Protein Data Bank
See Also in Proteopedia
- About Macromolecular Structure, a list of pages in Proteopedia
- Atomic coordinate file
- Biological Unit
- PDB file
- PDB identification code
- Highest impact structures of all time
- Improving published models
- Quality assessment for molecular models
- World Wide Protein Data Bank
- RCSB PDB
- Protein Data Bank in Wikipedia
- Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res.35:D301-3. (2007) PMID:17142228.
- Berman HM et al., The Protein Data Bank, Acta Crystallogr D Biol Crystallogr.58:899-907 (2002). PMID:12037327
- H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242 (2000). PMID:10592235.
- Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O, Abola EE (1998). "Protein data bank (PDB): a database of 3D structural information of biological macromolecules". Acta Cryst D54:1078-1084. PMID 10089483.
- Earliest Solutions for Macromolecular Crystal Structures
- See also Theoretical Models.
References and Notes
- ↑ Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nat Struct Biol. 2003 Dec;10(12):980. PMID:14634627 doi:10.1038/nsb1203-980
- ↑ Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007 Jan;35(Database issue):D301-3. Epub 2006 Nov 16. PMID:17142228 doi:10.1093/nar/gkl971
- ↑ Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535-42. PMID:875032
- ↑ In May 2012, the following numbers were reported by advanced search on release dates at RCSB. 2011: 8,101. 2010: 7907. 2009: 7388. 2008: 6964. 2007: 7199.