SMARTS Tutorial
Table of Contents
1. Introduction
2. Properties of Atoms
3. Bonds
4. Logical Operators
5. Recursive SMARTS
6. Component-Level Grouping
7. Reaction SMARTS
1. Introduction:
SMARTS...
...means SMiles ARbitrary Target Specification
...is a language used for describing molecular patterns and properties
...rules are straightforward extensions of SMILES
- All SMILES symbols and properties are legal in SMARTS.
- SMARTS includes logical operators and additional molecular descriptors
...can describe structural patterns with varying degrees of specificity
and generality:
- SMILES for methane: C or [CH4]
- High specificity SMARTS describing a pattern consistent with methane:
[CH4]
Only matches aliphatic carbon atoms that have 4 hydrogens.
Won't match ethane, ethene, or cyclopentane.
- Low specificity SMARTS describing a pattern consistent with methane:
C
Matches aliphatic carbon atoms that have any number of hydrogens.
Will match ethane, ethene, and cyclopentane.
2. Properties of Atoms
SMARTS
Hits SMILES:
Note
[+1]
Atoms that have a plus one charge
All SMILES atomic properties are valid in SMARTS;
this includes charge, hydrogen count, isotopic specifications,
bond symbols, and chirality specification. + is +1, ++ is +2,
etc.
[a]
Atoms that are aromatic
"a" is any aromatic atom.
[A]
Atoms that are aliphatic
"A" is any aliphatic atom.
[#6]
Atoms that have an atomic number of 6 (c or C)
"#<number>" defines an atom that has an atomic
number of <number>. Hits both aliphatic and aromatic atoms.
[R2]
Atoms that are in 2 rings
"R<number>" defines an atom that is in
<number> rings. Default (R) is any ring atom.
[r5]
Atoms that are in a ring that has 5 members
"r<number>" defines an atom that is in a
ring that has <number> members. Default (r) is any ring atom.
[v4]
Atoms that are four-valent
"v<number>" defines an atom that has
<number> bonds. Total bond order (= is 2 bonds, # is 3)
[X2]
Atoms that are connected to two other atoms
"X<number>" defines an atom that is connected to
<number> other atoms (including all hydrogens)
[H]
Hydrogen Atoms
A hydrogen atom (often called an "explicit hydrogen")
has special properties ([H+],[2H], [H][H] etc). [H+] and [2H] behave
similarly.
[H1]
Atoms that have one attached hydrogen.
" H<number>" defines an atom that has <number>
attached hydrogens ("implicit" or "explicit", i.e. H property or
H atom ). Default, [*H], is 1 for a non-hydrogen atom.
*
Any Atom
In SMARTS, the wildcard atom ,"*", matches all atoms.
It won't hit hydrogens which are merely properties of heavy atoms.
3. Bonds
SMARTS
Hits SMILES:
Note
CC
Molecules where an aliphatic carbon is SINGLE BONDED to another
aliphatic carbon
All SMILES bond properties are valid in SMARTS; this
includes implicit single bonds, explicit single bonds (-), double
bonds (=), triple bonds (#), and aromatic bonds (:). WON'T match
double bonds or triple bonds (includes C=C and C#C ...).
[#6]~[#6]
Molecules where two carbons are connected by any bond (includes single
bonds, double bonds, triple bonds, and aromatic bonds)
"~" means any bond (wildcard bond).
[#6]@[#6]
Molecules where two carbons are connected by a ring bond
"@" is a bond between two atoms that are within the same
ring.
F/?[#6]=C/Cl
Molecules where a carbon (which is connected to a fluorine by a
directional "up or unspecified" bond) is connected to another
carbon (which is connected by an "up" bond to a chlorine) (e.g.
F/C=C/Cl and FC=C/Cl ). This excludes molecules where a carbon
(which is connected to a fluorine by a "down" bond) is connected
to another carbon (which is connected to a chlorine by an "up" bond)
"?" means "OR unspecified". "?" may also be used with
chirality specification (@ and @@).
4. Logical Operators
SMARTS
Hits SMILES:
Note
[!c]
Atoms that are NOT aromatic carbons
"!" means "not".
[N,#8]
Atoms that are an aliphatic Nitrogen OR an Oxygen (aromatic
or aliphatic)
"," means OR. OR is higher precedence than low precedence
"and"(;), but lower precedence than high precedence "and" (&).
[#7,C&+O,+1]
or
[#7,C+O,+1]
Atoms that (are Nitrogens) or (are neutral aliphatic Carbons)
or (are positively charged)
"&" is "and" (high precedence). High
precedence "and" is the default logical operator and may be omitted.
[#7,C;+0,+1]
Atoms that (are Nitrogens or are aliphatic Carbons) and (are
neutral or positively charged)
";" is "and" (low precedence).
5. Recursive SMARTS
SMARTS
Hits SMILES:
Note
[$(*O);$(*CC)]
Atoms that are in an environment where (the atom is connected to an
aliphatic oxygen) and where (the atom is connected to two sequential
aliphatic carbons)
Any SMARTS expression may be used to define an atomic
environment by writing a SMARTS starting with the atom of interest
in this form: $(<SMARTS>)
[$([CX3]=[OX1]),
$([CX3+]-[OX1-])]
Atoms that are within molecules which contain a Carbonyl group
(either resonance structure)
[$([#6]aaO);$([#6]aaaN)]
Aliphatic carbon that is ortho to an O and meta to an N
6. Component-Level Grouping
SMARTS
Hits SMILES:
Note
[#8].[#8]
Molecules that contain two oxygens ( e.g. O=O, OCCO and O.CCO)
"." (dot) in SMARTS means "not necessarily connected".
([#8].[#8])
Molecules that contain two oxygens that are within the same
component ( e.g. O=O and OCCO but NOT O.CCO)
A single set of parentheses may surround any legal SMARTS
expression. Here parenthesis indicate that the contents are within
the same component of the target SMILES.
([#8]).([#8])
Molecules or mixtures that contain two oxygens that are within
different components ( e.g. O.CCO but NOT O=O or OCCO)
Separate Component-Level Groupings may be specified. Here
parenthesis indicate that the respective contents are within different
components of the target SMILES.
7. Reaction SMARTS
SMARTS
Hits SMILES:
Note
[#6]=,:[#6]
Carbons connected by a (double or aromatic) bond.
Molecule SMARTS (SMARTS without ">" characters) can match anywhere in a Reaction SMILES target (reactant, agent, or product).
>>[#6]=,:[#6]
Product Carbons connected by a (double or aromatic) bond.
Reaction SMARTS (SMARTS with ">" characters) never match molecule targets.
[C:1]>>[C:1]
Mapped reacting carbons.
Mapped SMARTS atomic queries never match unmapped
target atoms. Mapped SMARTS reaction queries never
hit unmapped reaction targets.
[C:1]>>C
Reacting carbons.
Unpaired maps in the query are ignored.
[C:1][C:1]>>[C:1]
Multiple mapped reacting carbons.
SMARTS map classes inter-relate reactants to products but don't intra-relate reactants
or products. (Although query reactants have the same class, they
can match target reactants of different classes.)
More Information
Theory Manual
SMARTS Examples
SMARTS Practice