NMR Ensembles of Models
Structure Determination by NMR
About 11% of the entries in the Protein Data Bank were determined by nuclear magnetic resonance in solution (NMR) as of mid-2012. 88% were determined by X-ray crystallography, and <1% by other methods. NMR can only be used for relatively small macromolecules (see below).
NMR spectroscopy is based on the ability of a nucleus with a spin of 1/2 (e.g. 1H, 13C, 15N, 31P) to adopt two different orientations in a magnetic field. The distribution of nuclei between the two states can be changed by subjecting them to a short pulse of radiation with a frequency commensurate with the energy difference between them. Monitoring the magnetic signals in the subsequent decay can yield dynamic information about the orientation and spacing of the nuclei, which provide restraints that can be turned into structural information.
The primary data yielded by NMR analysis is mostly local and more recently global geometric information about atoms within the structure. Typically, these include distance between pairs of atoms, dihedral angles (typically backbone φ angles and some side-chain χ1 angles) and sometimes global information such as the orientation of a given bond with respect to a fixed axis of the molecule. These data are used as "restraints" to reconstruct 3D models which are compatible with the NMR data. All calculations are performed directly in the physical space, starting with a random conformation of the macromolecule, which is progressively folded to satisfy the restraints. Typically, several runs are performed, starting from different initial conformations, in order to check that the calculation converges onto a single solution. The result is thus an ensemble of models, the distribution of which gives a measure of the precision of the NMR structure.
Model building for NMR experiments typically starts with the complete protein or nucleic acid chain, including hydrogen atoms. The distance restraints are then applied. The resulting model usually includes the entire protein and nucleic acid chains, unlike X-ray crystallographic models that often lack the ends, and even loops in the middle of chains, due to disorder in protein crystals.
Macromolecular structure determination by NMR is done at high protein concentrations in aqueous solution, and thus requires that the molecule be highly soluble. For more information, see Nature of 3D Structural Data and NMR in Wikipedia.
Displaying NMR Models
Display of NMR Models by Proteopedia
|1lcd, 3 NMR models ()|
For ensembles of models from NMR experiments, Proteopedia initially displays just the first model, in the usual cartoon rendering; this is done so to speed-up page loading. You will see a "Displaying simplified model" message within the JSmol panel. If you then click the "load full" button (in orange color), Proteopedia will show all the models, enabling you to see where the models agree with each other, and where they differ. Each model is shown as a thin backbone trace (a line connecting alpha carbon atoms of amino acids, or phosphorus atoms in DNA or RNA chains). The backbone traces are colored by Amino to Carboxy "rainbow", a spectral sequence of colors starting at the amino terminus (or 5' terminus of nucleic acid chains) and ending at the carboxy terminus (or 3' terminus).
Ligands (Hetero atoms) are also shown for all models, except that they are opaque only for model 1, and translucent for all other models. Ligand atoms are colored by element, using the CPK color scheme. Examples with hetero groups covalently linked to chain termini, with extremely variable positions, are 1jsa and 1dqc. 1bah also has hetero groups in variable positions. 1hpn has only hetero atoms.
The example at right, after clicking the "load full" button, shows the 3 models for 1lcd, a lac repressor domain bound to DNA, with one sodium ion. Water is present in this model, but for clarity, Proteopedia does not show water in its initial scene. . (To hide water, click the initial scene green link just below the molecule.)
Disulfide bonds are shown as yellow rods connecting backbones, with the first model opaque, and all other models translucent. An example is 1iw4.
Proteopedia shows only the first model by default, while it says Displaying simplified model. After you click the orange load full button, all models will be displayed.
In order to view individual models, click on JSmol or Jmol_S (lower right corner below the molecule) to open Jmol's menu. There, use the All N models item (where N is the total number of models in the ensemble). For example, clicking on 1.1: 1 will display only model 1, and the menu will now say model 1/N. You can also use Jmol's menu to change the rendering and coloring.
FirstGlance in Jmol also shows model 1 by default, but you can click on View All Models.
Animating NMR Ensembles
When the models in an NMR ensemble are played like a movie, the resulting animation simulates thermal motion (although not all the motions are necessarily real -- see below). In order to animate the models, click on JSmol or Jmol_S (lower right corner below the molecule) to open Jmol's menu. Choose Animation, then Animation mode, and click on Loop. Then choose Animation again, and click Play. You can change the speed of the animation with FPS (frames per second) on the Animation menu. By default, there is a delay at the first and last models.
Multiple Model Ensembles from NMR
NMR Experiments Yield Multiple Models
When a macromolecular structure is determined by nuclear magnetic resonance (NMR) in solution, the result is an ensemble of multiple molecular models, each of which is consistent with the experimental data. The results of an NMR experiment are a large number of inter-atomic distance restraints, which are consistent with multiple models. This is in contrast to the result of an X-ray crystallographic experiment, which is a single model that best fits the empirical electron density. (In some cases where the resolution is very high, the model may include alternative positions for some atoms.)
The number of NMR models published depends upon the experiment and is up to the authors, and varies between 2 (e.g. 1cvo) and over 100. The median number of models is 20. (You can search for entries with a specified number or range of models using OCA). The first model in the ensemble has no special significance (see the most representative model).
Meaning of the Variation Between Models
The variation between models in the ensemble can mean either of two things. The variation can represent actual flexibility and thermal motion that occurred during the NMR measurements in solution, typically at room temperature. Alternatively, the variation can simply mean uncertainty in the atomic positions, namely, that an inadequate number of restraints were available to determine the positions of some atoms. Unfortunately, there is nothing comparable to the B value or Temperature value that quantitates the uncertainty of the position of each atom in crystallographic results. Specific NMR relaxation experiments can however be used to measure the dynamics of individual atoms, mainly backbone amide groups, as the relaxation of the NMR signal is indeed dependent on the internal motions of the molecule. When these NMR relaxation data are available, they can be used to determine order parameters, which are strongly correlated with the B values of the crystallographic structures. These can be used to distinguish between intrinsic flexibility and uncertainty due to lack of constraints. When relaxation data is not available, the only way to find out what the meaning of the variation between models is to contact the experimenters who authored the published ensemble of models.
Protein chains commonly have more variation between models at the ends than in the middle. An example is 2yru.
Using appropriate methodologies, it is possible to determine both the average structure and its dynamic movements.
The Most-Representative Model
The most representative model is the model closest to the average model. A server called Olderado reports the most representative model, and enables you to download it separately.
The Minimized Average Model
It is common to average the models from an NMR experiment, but in order for the result to be realistic, it must undergo some energy minimization in order to adjust covalent bond lengths and angles. The result is called a minimized average model. Sometimes, authors publish both the ensemble and the minimized average. For example 2bbm appears to be the minimized average for the ensemble of 21 models in 2bbn, but without reading the original publication or contacting the authors, it is difficult to be sure (since the header of the PDB file does not say).
Reliability of NMR Models
NMR models are more likely to contain major errors  than are crystallographic models that have good Resolution and Free R values. See also Quality assessment for molecular models. In 2012, an X-ray crystallographic structure of integral membrane diacylglycerol kinase, 3ze4, revealed functionally important domain swapping that was not present in an earlier NMR structure 2kdc.
Median Size of Published NMR Structures
Solution NMR is unable to determine atomic resolution protein structures for molecules in excess of about 30,000 Daltons. In fact, the median mass of NMR structures published in the Protein Data Bank is about 9 kD, with 90% less than 19 kD . In contrast, the median mass of crystallographically determined structures is 45 kD, with 90% <145 kD.
Alignment of Models
NMR models are typically structurally aligned by the authors before publication. However, there are some exceptions, such as 1qp6, 1dl0, and 1i25, in which the individual models are not aligned. In such cases, one needs to look at individual models in order to understand the molecular structure.
The alignment can affect your perception of the variation between models. For example, calmodulin contains two EF-hands connected by a flexible linker. When calmodulin is not bound to a cognate peptide, the two EF-hands can move relative to each other, flexing the linker. In 1cfc, the N-terminal EF-hands are aligned, but the C-terminal EF-hands are in different orientations. Alternatively, had the C-terminal EF-hands been aligned, then the N-terminal EF-hands would be in variable orientations. And less plausibly, had a short center segment of the flexible linker been aligned, both ends would be in variable orientations.
Another example of two folded domains (zinc fingers) connected by a flexible linker is 1zu1. Again, only one domain can be aligned, and which one is arbitrary.
References and Websites
- ↑ Quoted from page 22 of the book Molecular Biology of Assemblies and Machines by Steven, Baumeister, Johnson and Perham, Garland/CRC Press, 2016.
- ↑ Nature of 3D Structural Data
- ↑ NMR in Wikipedia
- ↑ Simultaneous determination of protein structure and dynamics. Kresten Lindorff-Larsen, Robert B. Best, Mark A. DePristo, Christopher M. Dobson, and Michele Vendruscolo (2005). Nature 433:128. PMID:15650731.
- ↑ Traditional biomolecular structure determination by NMR spectroscopy allows for major errors. Sander B. Nabuurs, Chris. A. E. M. Spronk, Geerten W. Vuister, and Gert Vriend. (2006). PLoS Computational Biology 2: Open Access Full Text Precis. DOI: 10.1371/journal.pcbi.0020009
- ↑ Zheng J, Jia Z. Structural biology: tiny enzyme uses context to succeed. Nature. 2013 May 23;497(7450):445-6. doi: 10.1038/nature12245. Epub 2013 May 15. PMID:23676672 doi:http://dx.doi.org/10.1038/nature12245
- ↑ Li D, Lyons JA, Pye VE, Vogeley L, Aragao D, Kenyon CP, Shah ST, Doherty C, Aherne M, Caffrey M. Crystal structure of the integral membrane diacylglycerol kinase. Nature. 2013 May 23;497(7450):521-4. doi: 10.1038/nature12179. Epub 2013 May 15. PMID:23676677 doi:10.1038/nature12179
- ↑ Van Horn WD, Kim HJ, Ellis CD, Hadziselimovic A, Sulistijo ES, Karra MD, Tian C, Sonnichsen FD, Sanders CR. Solution nuclear magnetic resonance structure of membrane-integral diacylglycerol kinase. Science. 2009 Jun 26;324(5935):1726-9. PMID:19556511 doi:324/5935/1726
- ↑ At Protein Explorer: Size and Redundancy of 43,000 Entries in the Protein Data Bank (as of April 2007).