International Chemical Identifier
The IUPAC International Chemical Identifier (InChI), developed by IUPAC and NIST, is a digital equivalent of the IUPAC name for any particular covalent compound. Chemical structures are expressed in terms of five layers of information — connectivity, tautomeric, isotopic, stereochemical, and electronic. The stated aim of the InChI is to provide a standard way to structure and encode molecular information.[1]
The InChI algorithm converts input structural information into the InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique set of atom labels), and serialization (to give a string of characters).
The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (25 character) condensed digital representation of the InChI. It was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematical with the full-length InChI.[2]
Examples
CH3CH2OH ethanol |
InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3 |
L-ascorbic acid |
InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 |
Layer types
There are six InChI layer types:
- Main layer
- Charge layer
- Stereochemical layer
- Isotopic layer
- Fixed-H layer
- Reconnected Layer
Sub-layers
Each layer can be split into sub-layers. For example, the main layer can be split up into three sub-layers:
- Chemical formula (no prefix)
- Atom connections (prefix: "c")
- Hydrogen atoms (prefix: "h")
Notation
Layers and sub-layers are both separated by the "/" delimiter. All layers and sub-layers (except for the chemical formula sub-layer of the main layer) start with a lower-case letter indicating the type of information held in that layer.
InChIKey
The condensed, 25 character InChIKey is a hashed version of the full InChI, designed to allow for easy web searches of chemical compounds.[2] Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy searching, and therefore the InChIKey was developed. There is a very small, but finite chance of two different molecules having the same InChIKey, but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures. With all databases currently having below 50 million structures, such duplication appears unlikely at present.
- Example
Morphine has the structure shown on right.
The InChI for morphine is InChI=1/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11-,13-,16-,17-/m0/s1
but the InChIKey for morphine is simply BQJCRHHNABKAKU-XKUOQXLYBY [3]
See also
- Molecular Query Language
- SMILES
- Molecule editor
References
- ↑ McNaught, Alan (2006). "The IUPAC International Chemical Identifier:InChl". Chemistry International (6). IUPAC. Retrieved 18 September 2007. Unknown parameter
|volueme=
ignored (help) - ↑ 2.0 2.1 "The IUPAC International Chemical Identifier (InChI)". IUPAC. 5 September 2007. Retrieved 18 September 2007.
- ↑ "InChI=1/C17H19NO3/c1-18..." Chemspider. Retrieved 18 September 2007.
External links
- IUPAC InChI site
- InChI.info - an unofficial InChI website featuring on-line converter from InChI to molecular drawings
- Unofficial InChI FAQ
- Generate InChI (interactive service at University of Cambridge, either interactive or WSDL)
- Search Google for molecules (generates InChI from interactive chemical and searches Google for any pages with embedded InChIs). Requires Javascript enabled on browser
- Free ChemSketch Drawing Package Chemical Structure drawing package including output to InChI file format and conversion of InChI to structure
- PubChem online molecule editor that supports SMILES/SMARTS and InChI
- ChemSpider Services that allows generation of InChI and conversion of InChI to structure (also SMILES and generation of other properties)
- MarvinSketch implementation to draw structures (or open other file formats) and output to InChI file format
- Googling for InChIs a presentation to the W3C.
- Presentation on InChIs from the Googleplex
- InChIMatic Draw your molecule and Google will search for it
- BKchem implements its own InChI parser and uses the IUPAC implementation to generate InChI strings