Thursday, 19 February 2015

There's more to molecular design than making predictions

So it arrived as a call to beware of von Neumann’s elephants and the session has already been most ably covered by The CuriousWavefunction (although I can’t resist suggesting that Ash's elephant may look more like a cyclic voltammogram than the pachyderm of the title).  The session to which I refer was Derek Lowe’s much anticipated presentation to a ravening horde of computational chemists that was organized by Boston Area Group Modeling and Informatics (BAGIM).  Derek Pipelined beforehand and I provided some helpful tips (comment 1) on how to keep the pack at bay. I have to admit that I meant to say that that many continuum solvent models are charge-type symmetric (as opposed to asymmetric) and should point out that it is actually reality (e.g. DMSO) that is asymmetric with respect to charge type.  At least no damage appears to have been done.  As an aside, I was gratified by how enthusiastically the bait was taken (see comments 8 and 9) but that’s getting away from the aim of this post which is to explore the relationship between medicinal chemist and computational chemist.

Many in the drug discovery business seem to be of the opinion that the primary function of the pharmaceutical computational chemist is predict activity (enzyme inhibition; hERG blockade) and properties (e.g. solubility; plasma protein binding) of compounds before they are synthesized.  I agree that the goal of both computational chemists and medicinal chemists is identification of efficacious and safe compounds and accurate prediction would be of great value in enabling the objective to be achieved efficiently.  However, useful prediction of the (in vivo) effects of an arbitrary compound directly from molecular structure is not something that is currently feasible nor does it look like it will become feasible for a long time.  Tension is likely to develop between the computational chemist and the medicinal chemist when either or both believe that the primary function of the former is simply to build or use predictive (e.g. QSAR) models for activity and ADME behavior.

One way of looking at Drug Discovery is as a process of accumulating knowledge and perhaps the success of the drug discovery project team should be measured by how efficiently they acquire that knowledge.  A project team that quickly and unequivocally invalidates a target has made a valuable contribution because the project can be put out of its misery and the team members can move onto projects with greater potential.  Getting smarter (and more honest) about shutting down projects is one area in which the pharma/biotech industry can improve.  Although there are a number of ways (e.g. virtual/directed screening; analysis of screening output) that the computational chemist can contribute during hit identification, I’d like to focus on how computational and medicinal chemists can work together once starting points for synthesis have been identified (i.e. during hit-to-lead and lead optimization phases of project).

In drug discovery projects we typically we accumulate knowledge by posing hypotheses (3-chloro will increase potency or a 2-methoxy will lock the molecule into its bound conformation) and that’s why I use the term hypothesis-driven molecular design (HDMD) that I wrote about in an earlier post.  When protein structural information is available, the hypothesis often takes the form that making a new interaction will lead to an increase in affinity and the key is finding ways of forming interactions with optimal binding geometry without incurring conformational/steric energy penalties.  The computational chemist is better placed than the medicinal chemist to assess interactions by these criteria while the medicinal chemist is better placed to assess the synthetic implications of forming the new interactions.  However, either or both may have generated the ideas that the computational chemist assessed and many medicinal chemists have strong grasp of molecular interactions and conformational preferences even if they are unfamiliar with molecular modelling software used to assess these.  I always encourage medicinal chemists to learn to use the Cambridge Structural Database (CSD) and because it provides an easy way to become familiar with 3D structures and conformations of molecules as well as providing insight into the interactions that molecules make with one another.  It also uses experimental data so you don’t need to worry about force fields and electrostatic models. Here’s a post in which I used the CSD to gain some chemical insight that will give you a better idea of what I’m getting at.

One question posed in the Curious Wavefunction post was whether medicinal chemists should make dead compounds to test a model.   My answer to the question is that compounds should be prioritized for synthesis on the basis of the how informative they are likely to be or how active they are likely to be and, as pointed out by Curious Wavefunction, synthetic accessibility always needs to be accounted for. In hit-to-lead or early lead optimization, I’d certainly consider synthetic targets that were likely to be less active than the compounds that had already been synthesized but these would need have potential to provide information.  You might ask how should we assess the potential of a compound to provide information and my response would be that it is not, in general, easy but this is what hypothesis-driven molecular design is all about.  The further you get into lead optimization, the less likely it becomes that an inactive compound will be informative.

I realize that talking about hypothesis-driven molecular design and potential of synthetic targets to provide information may seem a bit abstract so I’ll finish this post with something a bit more practical.  This goes back over a decade to a glycogen phosphorylase inhibitor project for diabetes (see articles 1 | 2 | 3 | 4 | 5 | 6 ).  While lead discovery tends to be a relatively generic activity, the challenges in lead optimization tend to be more project-specific.  We were optimizing some carboxylic acids (exemplified by the molecular structure in the figure below) and were interested in reducing the fraction bound to plasma protein which is often an issue for compounds that are predominantly anionic under physiological conditions. 

I should point out that it wasn’t clear that reducing the bound fraction would have increased the free concentration (this point is discussed in more depth here) but hypothesis-driven design is more about asking good questions than making predictions. We wanted to reduce the unbound fraction (Fu) but we also wanted to keep the compounds anionic which suggested replacing the carboxylic acid with a bioisostere (see 1 | 2 | 3 | 4 | 5 ) such as tetrazole.  If you happen to be especially interested in the basis for the bioisosteric relationship between these molecular recognition elements, have a look at Fig 7 in this article but I’ll be focusing on finding out what effect the bioisosteric replacement will have on Fu.  Tetrazoles are less lipophilic (logP values are 0.3 to 0.4 units lower) than the corresponding carboxylic acids so we might expect that replacing a carboxylic acid with tetrazole will result in an increase  in Fu (this thinking is overly simplistic although saying why would take the blog post off on a tangent but I'm happy to pick this up in comments).  We did what has become known as matched molecular pair analysis (MMPA; see  1 | 2 | 3 | 4 | 5 | 6 ) and searched the in house plasma protein binding (PPB) database for pairs of compounds in which carboxylic acid was replaced by tetrazole while keeping the remainder of the molecular structure constant.  We do MMPA because we believe that it is easier to predict differences in the values of properties or activity than it is to predict the values themselves directly from molecular structure.  Computational chemists may recognize MMPA as a data analytic equivalent of free energy perturbation (FEP; see 1 | 2 ).

The original MMPA performed in 2003 suggested that tetrazoles would be more highly protein bound (i.e. lower Fu) than the corresponding carboxylic acids and on that basis we decided not to synthesize tetrazoles. Subsequent analysis of data for a larger number of matched molecular pairs that were available in 2008 arrived at the same conclusion with a higher degree of confidence (look at SE values) but it was the 2003 analysis that was used make the project decisions.  The standard deviation (SD) values of 0.23 and 0.20 are also informative because these suggest that the PPB assay is very reproducible even though it was not run in replicate (have a look at this article to see this point discussed in more depth).

You might ask what we would have done if we hadn’t been able to find any matched molecular pairs with which to test our hypothesis.  Had we decided to synthesize a tetrazole then it would have been sensible to select a reference carboxylic acid for which Fu was close to the center of the quantifiable range so as to maximize the probability of the matching tetrazole being ‘in-range’.

This is probably a good place to leave things.  Even if you don't agree with what I've said here,  I hope this blog post has at least got you thinking that there may be more to pharmaceutical computational chemistry than just making predictions.