Thursday, September 16, 2010

The SMILES stereochemistry enigma

Almost every cheminformatics system implementing smiles limits the support for stereochemistry to the tetrahedral and double bond cases. Some (e.g. OpenBabel, Jmol, ...) implement the square planar SP tags which only has 3 possible values. However, the daylight smiles page has very limited documentation when it comes to trigonal-bipyramidal and octahedral stereochemistry. All strictly needed information is there though and this blog post will give an overview of how to obtain the meaning of the TB3-TB20 and OH3-OH30 tags.

Since the trigonal-bipyramidal case is slightly easier it will be discussed first. As can be seen in the image below, not all ligands are equivalent. There are two axial ligands (S and F) and three equatorial ligands (Cl, Br, I).



The TB1 tag is documented on the daylight website meaning viewing from the first ligand atom to the last, the three remaining ligands are ordered counter clockwise (i.e. @). A valid smiles for the depicted molecule would be F[X@TB1](I)(Br)(Cl)S. TB2 means viewing along the same axis, the three remaining are clockwise (i.e. @@). With this information it is possible to write any molecule by hand since you can reorder the atoms as needed. However, when writing out a canonical smiles, the order of the atoms is determined by the canonicalization algorithm and there are 5! = 120 possible ways to order 5 ligands. Since the ligands are not equivalent, there must be a way to encode the axis ligand positions using the tags. This hypothesis is supported by the fact that there are 10 ways to position the axis atoms (a-e, a-d, a-c, a-b, b-e, b-d, b-c, c-e, c-d, d-e), 10 * 2 (@|@@) = 20 tags and each tag has 3! = 6 permutations associated with it resulting in 120 permutations total.

To find the order of these axis, the more subtle information from the daylight smiles specification can be used. The first clue is that TH1 and TH2 are two cases of a generic chiral specification method. The other clues are that tags are actually "chiral permutation designators" and "it's table driven". Now all that is left to do is enumerate all permutations lexicographically and assign the designators when a new axis is found. The first column is the permutation number, the second is the permutation itself, the 3th column is are the axis positions, the 4th is the winding and finally the designator. (note: TB1/TB2 are designated first, skipping permutation 2)


1: 01234 0-4 @ TB1
2: 01243 0-3 @ TB3
3: 01324 0-4 @@ TB2
4: 01342 0-3 @@ TB4
5: 01423 0-2 @ TB5
6: 01432 0-2 @@ TB6
7: 02134 0-4 @@ TB2
8: 02143 0-3 @@ TB4
9: 02314 0-4 @ TB1
10: 02341 0-3 @ TB3
11: 02413 0-2 @@ TB6
12: 02431 0-2 @ TB5
13: 03124 0-4 @ TB1
14: 03142 0-3 @ TB3
15: 03214 0-4 @@ TB2
16: 03241 0-3 @@ TB4
17: 03412 0-2 @ TB5
18: 03421 0-2 @@ TB6
19: 04123 0-1 @ TB7
20: 04132 0-1 @@ TB8
21: 04213 0-1 @@ TB8
22: 04231 0-1 @ TB7
23: 04312 0-1 @ TB7
24: 04321 0-1 @@ TB8
25: 10234 1-4 @ TB9
26: 10243 1-3 @ TB10
27: 10324 1-4 @@ TB11
28: 10342 1-3 @@ TB12
29: 10423 1-2 @ TB13
30: 10432 1-2 @@ TB14
31: 12034 2-4 @ TB15
32: 12043 2-3 @ TB16
33: 12304 3-4 @ TB17
34: 12340 3-4 @@ TB18 (40 @ -> @@)
35: 12403 2-3 @@ TB19 (40 @ -> @@
36: 12430 2-4 @@ TB20 (40 @ -> @@)


This enumeration can be done by hand or using a simple program. However, the information is easier to read when rearranged:

TB1/TB2: axis 0-4, @/@@
TB3/TB4: axis 0-3, @/@@
TB5/TB6: axis 0-2, @/@@
TB7/TB8: axis 0-1, @/@@
TB9/TB11: axis 1-4, @/@@
TB10/TB12: axis 1-3, @/@@
TB13/TB14: axis 1-2, @/@@
TB15/TB20: axis 2-4, @/@@
TB16/TB19: axis 2-3, @/@@
TB17/TB18: axis 3-4, @/@@

The octahedral designators can be obtained in the same way. Although there seem to be three axes to choose from, the requirement to list the remaining 4 ligands clockwise or counter clockwise restricts the number of axes to 1. The additional ligand results in 15 possible axis combinations giving rise to 30 designators. A c++ example program generating these designators can be found here. For OH1-OH30 the following designators are obtained:

OH1/OH2: axis 0-5, @/@@
OH3/OH4: axis 0-4, @/@@
OH5/OH6: axis 0-3, @/@@
OH7/OH8: axis 0-2, @/@@
OH9/OH10: axis 0-1, @/@@
OH11/OH13: axis 1-5, @/@@
OH12/OH14: axis 1-4, @/@@
OH15/OH16: axis 1-3, @/@@
OH17/OH18: axis 1-2, @/@@
OH19/OH21: axis 2-5, @/@@
OH20/OH22: axis 2-4, @/@@
OH23/OH24: axis 2-3, @/@@
OH25/OH30: axis 3-5, @/@@
OH26/OH29: axis 3-4, @/@@
OH27/OH28: axis 4-5, @/@@

2 comments:

Egon Willighagen said...

Nice post!

Tony Cook said...

Nice detective work!

I was lazy in this respect and asked Daylight for the table source code which they sent me when we did the Accord Smiles reader/writer (many years ago)