Since the trigonal-bipyramidal case is slightly easier it will be discussed first. As can be seen in the image below, not all ligands are equivalent. There are two axial ligands (S and F) and three equatorial ligands (Cl, Br, I).

The TB1 tag is documented on the daylight website meaning viewing from the first ligand atom to the last, the three remaining ligands are ordered counter clockwise (i.e. @). A valid smiles for the depicted molecule would be F[X@TB1](I)(Br)(Cl)S. TB2 means viewing along the same axis, the three remaining are clockwise (i.e. @@). With this information it is possible to write any molecule by hand since you can reorder the atoms as needed. However, when writing out a canonical smiles, the order of the atoms is determined by the canonicalization algorithm and there are 5! = 120 possible ways to order 5 ligands. Since the ligands are not equivalent, there must be a way to encode the axis ligand positions using the tags. This hypothesis is supported by the fact that there are 10 ways to position the axis atoms (a-e, a-d, a-c, a-b, b-e, b-d, b-c, c-e, c-d, d-e), 10 * 2 (@|@@) = 20 tags and each tag has 3! = 6 permutations associated with it resulting in 120 permutations total.
To find the order of these axis, the more subtle information from the daylight smiles specification can be used. The first clue is that TH1 and TH2 are two cases of a generic chiral specification method. The other clues are that tags are actually "chiral permutation designators" and "it's table driven". Now all that is left to do is enumerate all permutations lexicographically and assign the designators when a new axis is found. The first column is the permutation number, the second is the permutation itself, the 3th column is are the axis positions, the 4th is the winding and finally the designator. (note: TB1/TB2 are designated first, skipping permutation 2)
1: 01234 0-4 @ TB1
2: 01243 0-3 @ TB3
3: 01324 0-4 @@ TB2
4: 01342 0-3 @@ TB4
5: 01423 0-2 @ TB5
6: 01432 0-2 @@ TB6
7: 02134 0-4 @@ TB2
8: 02143 0-3 @@ TB4
9: 02314 0-4 @ TB1
10: 02341 0-3 @ TB3
11: 02413 0-2 @@ TB6
12: 02431 0-2 @ TB5
13: 03124 0-4 @ TB1
14: 03142 0-3 @ TB3
15: 03214 0-4 @@ TB2
16: 03241 0-3 @@ TB4
17: 03412 0-2 @ TB5
18: 03421 0-2 @@ TB6
19: 04123 0-1 @ TB7
20: 04132 0-1 @@ TB8
21: 04213 0-1 @@ TB8
22: 04231 0-1 @ TB7
23: 04312 0-1 @ TB7
24: 04321 0-1 @@ TB8
25: 10234 1-4 @ TB9
26: 10243 1-3 @ TB10
27: 10324 1-4 @@ TB11
28: 10342 1-3 @@ TB12
29: 10423 1-2 @ TB13
30: 10432 1-2 @@ TB14
31: 12034 2-4 @ TB15
32: 12043 2-3 @ TB16
33: 12304 3-4 @ TB17
34: 12340 3-4 @@ TB18 (40 @ -> @@)
35: 12403 2-3 @@ TB19 (40 @ -> @@
36: 12430 2-4 @@ TB20 (40 @ -> @@)
This enumeration can be done by hand or using a simple program. However, the information is easier to read when rearranged:
TB1/TB2: axis 0-4, @/@@
TB3/TB4: axis 0-3, @/@@
TB5/TB6: axis 0-2, @/@@
TB7/TB8: axis 0-1, @/@@
TB9/TB11: axis 1-4, @/@@
TB10/TB12: axis 1-3, @/@@
TB13/TB14: axis 1-2, @/@@
TB15/TB20: axis 2-4, @/@@
TB16/TB19: axis 2-3, @/@@
TB17/TB18: axis 3-4, @/@@
The octahedral designators can be obtained in the same way. Although there seem to be three axes to choose from, the requirement to list the remaining 4 ligands clockwise or counter clockwise restricts the number of axes to 1. The additional ligand results in 15 possible axis combinations giving rise to 30 designators. A c++ example program generating these designators can be found here. For OH1-OH30 the following designators are obtained:
OH1/OH2: axis 0-5, @/@@
OH3/OH4: axis 0-4, @/@@
OH5/OH6: axis 0-3, @/@@
OH7/OH8: axis 0-2, @/@@
OH9/OH10: axis 0-1, @/@@
OH11/OH13: axis 1-5, @/@@
OH12/OH14: axis 1-4, @/@@
OH15/OH16: axis 1-3, @/@@
OH17/OH18: axis 1-2, @/@@
OH19/OH21: axis 2-5, @/@@
OH20/OH22: axis 2-4, @/@@
OH23/OH24: axis 2-3, @/@@
OH25/OH30: axis 3-5, @/@@
OH26/OH29: axis 3-4, @/@@
OH27/OH28: axis 4-5, @/@@
2 comments:
Nice post!
Nice detective work!
I was lazy in this respect and asked Daylight for the table source code which they sent me when we did the Accord Smiles reader/writer (many years ago)
Post a Comment