Tuesday, March 9, 2010

It has been 6 months since my last post on stereochemistry. At the end of the post, the automorphisms for inositol were listed. These will now be used to enumerate all possible stereoisomers for a structure. Most of this post is only a summary, for details the Razinger paper can be consulted.

The first use of the automorphisms is to construct stereoindex vectors. These vectors contain 1 for all true stereocenters. Para-stereocenters are assigned -1 if the permutation interchanges equivalent atoms and 1 otherwise (see this post for examples of para-stereocenters). The automorphisms, or permutations in general can be represented as a matrix which can be multiplied by the stereoindex vectors. The resulting matrices are called signed permutation matrices and they are the key to revealing identical stereoisomers. For performance reasons only the stereogenic atoms are considered when constructing the signed permutation matrices. For example, inositol containing 6 tetrahedral stereocenters will have 6x6 signed permutation matrices.

Before the enumeration can begin, another matrix called the stereoparity matrix is needed. This matrix has n columns where n is equal to the number of stereocenters. There are 2^n rows in the matrix, one for each possible combination of the values 1 and -1. This is best illustrated with this example for 3 stereocenters.



The enumeration now begins by selecting row 1 from the stereoparitymatrix. This row is multiplied by each of the signed permutation matrices. The resulting row vectors are compared with the rows from the stereoparity matrix. Matching rows are redundancies of the same stereoisomer and are not considered in further steps. The identified stereoisomer can still become an enantiomer or diastereomer. This is determined by inverting row 1 and multiplying this with the signed permutation matrices. If any of the resulting vectors match a non redundant row, an enantiomers pair is found. If all resulting vectors match already redundant rows, a diastereomer is found. The same steps are repeated until all rows are assigned to a stereoisomer.

Determining the number and kind of stereoisomers alone isn't of much use though. A much more useful feature would be to generate each stereoisomer. For each stereoisomer, any of the rows assigned to it can be used. These rows can now be considered as parities for the stereogenic units in the molecule. The input molecule's data structures can be modified to match the parities in the row. Using inositol as example once more, the 9 possible stereoisomer smiles are:



The smiles are in fact canonical smiles. The information obtained by enumerating stereoisomers can also be used for this purpose. However, this time we are only interested in the stereoisomer specified by the input molecule. The redundant set of parities associated with it can be considered to be canonical candidates. Since the stereoparity matrix contains 2^n rows, every possible candidate for a given stereoisomer is guaranteed to be found. Canonical smiles for molecules like the ones below now becomes possible.



The current code can be found here.

PS: Sorry for the delay...

0 comments: