Protein design

Jump to: navigation, search

Protein design is the design of new protein molecules from scratch, or the deliberate design of a new molecule by making calcuated variations on a known structure. The number of possible amino acid sequences is infinite, but only a subset of these sequences will fold reliably and quickly to a single native state. Protein design involves identifying such sequences, in particular those with a physiologically active native state. Protein design is a rational design technique used in protein engineering.

Protein design requires an understanding of the process by which proteins fold. In a sense it is the reverse of structure prediction: a tertiary structure is specified, and a primary sequence is identified which will fold to it.

Protein design is also referred to as inverse folding. From a physical point of view, the native state conformation of a protein is the free energy minimum for the protein chain, at least at biological temperatures (i.e. at temperatures between zero and a hundred degrees Celsius). Hence, designing a new protein involves the identification of the sequences which have the chosen structure as free energy minimum. This can be done by use of computer models, which, while simplifing the problem, are able to generate sequences to fold on the desired structure.

The design of minimalist computer models of proteins (lattice proteins), and the secondary structural modification of real proteins, began in the mid-1990s. The de novo design of real proteins became possible shortly afterwards, and the 21st century has seen the creation of small proteins with real biological function including catalysis and antiviral behaviour. There is great hope that the design of these and larger proteins will have application in medicine and bioengineering.

Computational protein design algorithms seek to identify amino acid sequences that have low energies for target structures. While the sequence-conformation space that needs to be searched is large, the most challenging requirement for computational protein design is a fast, yet accurate, energy function that can distinguish optimal sequences from similar suboptimal ones. Using computational methods, a protein with a novel fold has been designed[1], as well as sensors for un-natural molecules[2].

On the other hand, it is widely believed that not all possible protein structures are designable, which means that there are compact configurations of the chain on which no sequences can fold to. In particular, conformations which are poor in secondary structures are unlikely to be designable. The designability of given structures is still an issue that is poorly understood.


EGAD: A Genetic Algorithm for protein Design[3]. A free, open-source software package for protein design and prediction of mutation effects on protein folding stabilities and binding affinities. EGAD can also consider multiple structures simultaneously for designing specific binding proteins or locking proteins into specific conformational states. In addition to natural protein residues, EGAD can also consider free-moving ligands with or without rotatable bonds. EGAD can be used with single or multiple processors.

WHAT IF software for protein modelling, design, validation, and visualisation.


  • C. Sander, G. Vriend, et. al., Protein Design on computers. Five new proteins: Shpilka, Grendel, Fingerclasp, Leather and Aida. PROTEINS 12, 105-110 (1992).
  • Jin et al, Structure, 11, 581 (2003).
  • Nagai et al, Proc. Natl. Acad. Sci. USA, 98, 3197 (2001).
  • Saghatelian et al, Nature, 409, 797 (2001).
  • Kuhlman et al "Science", 302:1364 (2003)[4]
  • Looger et al "Nature", 423:185 (2003)[5]
  • Pokala and Handel "J Mol Biol", 347:203 (2005)[6]

See also