In protein structure prediction, a statistical potential (also knowledge-based potential, empirical potential, or residue contact potential) is an energy function derived from an analysis of known structures in the Protein Data Bank. Typical measures could be phi / psi backbone torsion angles, binned by residue pairs or triplets, solvent accessibility, hydrogen bond characteristics, or empirical observations about the likelihood of native contacts between any two amino acid residues in the native state tertiary structure of a protein. Taking the last case as our example, in its simplest form, a statistical potential is formulated as an interaction matrix that assigns a weight or energy value to each possible contact pair of standard amino acids. The energy of a particular structural model is then the combined energy of all the residue-residue contacts (often defined as residues within 4Å) identified in the structure. The probabilities or weights are determined by statistical examination of native contacts present in a database of structures represented in the Protein Data Bank. According to the energy landscape view of protein folding, structures that closely resemble the native state will be distinguishably lower in free energy than those that are different from the native state.
The equation for converting a statistical potential to an energy level is
delta w[i] - - log( p[i] / p[reference] )
p[i] - the probability of observing a protein in bin i. p[reference] - the probability of observing the protein in the reference state.
This derives from the Boltzmann distribution. Molecules spend some of their time is less energetically favourable configurations when the temperature rises above absolute zero . The reference state is required because we cannot calcuate an absolute free energy level.
Statistical potentials are used as energy functions in the assessment of an ensemble of structural models produced by homology modeling or protein threading - predictions for the tertiary structure assumed by a particular amino acid sequence made on the basis of comparisons to one or more homologous proteins whose structures have been experimentally determined. Many differently parameterized statistical potentials have been shown to successfully identify the native state structure from an ensemble of "decoy" or non-native structures. In response to criticism that statistical potentials capture only the tendency of hydrophobic amino acids to pack closely in the hydrophobic core of a globular protein, refinements have included the creation of two interaction matrices parameterized separately for residues in the core and those on the solvent-accessible surface of the protein. The primary alternative method for assessing ensembles of models and identifying the lowest-energy structure represented relies on direct energy calculations, which are more computationally expensive than statistical potentials due to the necessity of calculating long-range electrostatic interactions.
Statistical potentials are not only useful in protein structure prediction, but are also capable of reproducing the protein folding pathways . This observation strongly suggests that the interactions in the denatured state are very similar to those in the native structures. Consequently, the knowledge-based potentials derived from native structures are a good approximation of the interactions in the denatured state.
- Narang P, Bhushan K, Bose S, Jayaram B. (2006). Protein structure evaluation using an all-atom energy based empirical scoring function. J Biomol Struct Dyn 23(4):385-406.
- Sippl MJ. (1993). Recognition of Errors in Three-Dimensional Structures of Proteins. Proteins 17:355-62.
- Bryant SH, Lawrence CE. (1993). An empirical energy function for threading protein sequence through the folding motif. Proteins 16(1):92-112.
- Park K, Vendruscolo M, Domany E. (2000). Toward an energy function for the contact map representation of proteins. Proteins 40(2):237-48.
- Kmiecik S and Kolinski A (2007). "Characterization of protein-folding pathways by reduced-space modeling". Proc. Natl. Acad. Sci. U.S.A. 104 (30): 12330–12335. PMID 17636132.