SMILES
You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.
| smiles | |
|---|---|
| File extension: | .smi |
| Type of format: | chemical file format |
The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.
The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. In 2007, an open standard called "OpenSMILES" was developed by the Blue Obelisk open-source chemistry community. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).
In August of 2006, the IUPAC introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.
Canonical SMILES and Isomeric SMILES
The terms Canonical and Isomeric can lead to some confusion when applied to SMILES. The terms describe different attributes of the SMILES and are not mutually exclusive.
Typically, a number of equally valid SMILES can be written for a molecule. For example, CCO, OCC and C(O)C all specify the structure of ethanol. Algorithms have been developed to ensure the same SMILES is generated for a molecule regardless of the order of atoms in the structure. This SMILES is unique for each structure, although dependent on the canonicalisation algorithm used to generate it, and is termed the Canonical SMILES. Algorithms for generating Canonical SMILES have been developed at both Daylight Chemical Information Systems and OpenEye Scientific Software. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database.
SMILES notation allows the specification of configuration at tetrahedral centers, and double bond geometry. These are structural features that cannot be specified by connectivity alone and SMILES which encode this information are termed Isomeric SMILES. A notable feature of these rules is that they allow rigorous partial specification of chirality. SMILES in which isotopes are specified are also described as Isomeric SMILES.
Graph-based definition
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.
Examples
Atoms
Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. The hydroxide anion is [OH-]. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O.
Bonds
Bonds between aliphatic atoms are assumed to be single unless specified otherwise and are implied by adjacency in the SMILES. For example the SMILES for ethanol can be written as CCO. Ring closure labels are used to indicate connectivity between non-adjacent atoms in the SMILES, which for Cyclohexane can be written as C1CCCCC1. Double and triple bonds are represented by the symbols '=' and '#' respectively as illustrated by the SMILES O=C=O (carbon dioxide) and C#N (hydrogen cyanide).
Branching
Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for fluoroform. Dioxane can be written in SMILES notation as O(CCO1)CC1, O(C1)CCOC1, C(OCC1)CO1 or O1CCOCC1.
Aromaticity
Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Benzene, pyridine and furan can be represented respectively by the SMILES c1ccccc1, n1ccccc1 and o1cccc1. Bonds between aromatic atoms are, by default, aromatic although these can be specified explicitly using the ':' symbol. Aromatic atoms can be singly bonded to each other and biphenyl can be represented by c1ccccc1-c2ccccc2. Aromatic nitrogen bonded to hydrogen, as found in pyrrole must be represented as [nH] and imidazole is written in SMILES notation as n1c[nH]cc1.
The Daylight and OpenEye algorithms for generating canonical SMILES differ in their treatment of aromaticity.
Stereochemistry
Configuration around double bonds is specified using the characters "/" and "\". For example, F/C=C/F (see depiction)is one representation of trans-difluoroethene, in which the fluorine atoms are on opposite sides of the double bond, whereas F/C=C\F (see depiction) is one possible representation of cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.
Configuration at tetrahedral carbon is specified by @ or @@. L-Alanine, the more common enantiomer of the amino acid alanine can be written as N[C@@H](C)C(=O)O (see depiction). The @@ specificer indicates that, when viewed from nitrogen along the bond to the chiral center, the substituents the sequence of substitients hydrogen (H), methyl (C) and carboxylate (C(=O)O)appear clockwise. D-Alanine can be written as N[C@H](C)C(=O)O (see depiction). The order of the substituents in the SMILES string is very important and D-alanine can also be encoded as N[C@@H](C(=O)O)C (see depiction).
Isotopes
Isotopes are specified with a number equal to the integer isotopic mass preceding the atomic symbol. C-14 benzene is [14c]1ccccc1 and deuterochloroform is [2H]C(Cl)(Cl)Cl.
Other Examples of SMILES
The SMILES notation is described extensively in the SMILES theory manual provided by Daylight Chemical Information Systems and a number of illustrative examples are presented. Daylight's depict utility provides users with the means to check their own examples of SMILES and is a valuable educational tool.
Extensions
SMARTS is a line notation for specification of substructral patterns in molecules. It is related to SMILES and, while it uses many of the same symbols, it also allows specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical database search applications. This practice has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when, in fact, it is achieved by the computationally more intensive search for subgraph isomorphism in the graphs reconstructed from the SMILES representations.
Conversion
SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches. There are many downloadable and web-based conversion utilities.
See also
- Smiles arbitrary target specification SMARTS langauge for specification of substructural queries.
- SYBYL Line Notation (another line notation)
- Molecular Query Language - query language allowing also numerical properties, e.g. physicochemical values or distances
- Chemistry Development Kit (2D layout and conversion)
- International Chemical Identifier (InChI), the free and open alternative to SMILES by the IUPAC.
- OpenBabel, JOELib, OELib (conversion)
References
- Anderson, E.; Veith, G.D; Weininger, D. (1987) SMILES: A line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021. U.S. EPA, Environmental Research Laboratory-Duluth, Duluth, MN 55804
- Weininger, D. (1988), SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci. 28, 31-36.
- Weininger, D.; Weininger, A.; Weininger, J.L. (1989) SMILES. 2. Algorithm for generation of unique SMILES notation J. Chem. Inf. Comput. Sci. 29, 97-101.
- Helson, H.E. (1999) Structure Diagram Generation In Rev. Comput. Chem. edited by Lipkowitz, K. B. and Boyd, D. B. Wiley-VCH, New York, pages 313-398.
External links
Specifications
- "SMILES - A Simplified Chemical Language"
- The OpenSMILES home page
- "SMARTS - SMILES Extension"
- Daylight SMILES tutorial
- Parsing SMILES
SMILES related software utilities
- smi23d - 3D Coordinate Generation
- Daylight Depict
- CACTVS at NCI
- PubChem online molecule editor
- JME molecule editor
- ACD/ChemSketch
- ChemAxon/Marvin - online chemical editor/viewer and SMILES generator/converter
- ChemAxon/Instant JChem - Desktop application for storing/generating/converting/visualizing/searching SMILES structures, particularly batch processing. Personal edition free
- Smormo-Ed:A Molecule editor for Linux which can read and write SMILES
- InChI.info: an unofficial InChI website featuring on-line converter from InChI and SMILES to molecular drawingsca:SMILES
de:Simplified Molecular Input Line Entry Specificationit:SMILES nl:SMILESno:SMILESfi:SMILES sv:SMILESur:اسمائلس
Acknowledgement and Attribution Regarding Sources of Content
Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

