Molecular Design: MMPA

Showing posts with label MMPA. Show all posts

Saturday, 7 April 2018

Hammett

I first became aware of Louis Hammett during the third term of my first year as an undergraduate at the University of Reading. Hammett was a pioneer in physical-organic chemistry and is widely regarded as one of the founders of that field. He would have been 124 today and was less than a year younger than Christopher Ingold, another pioneer in the field. Hammett passed away in 1987 at the age of 92 (here is an excellent obituary).

Today Hammett is remembered primarily for the parameters that describe electronic interactions between aromatic rings and their substituents. He also introduced linear free energy relationships which form the basis of classical QSAR. These days, QSAR has evolved away from its origins in physical-organic chemistry into what many call machine learning and parameters have become less physical (and considerably more numerous). Hammett's work provided an early lesson to wannabe molecular designers in how to think about molecules.

Jens Sadowski and I introduced matched molecular pair analysis (MMPA) in a chapter of a cheminformatics book that was conceived and edited by my dear friend (and favorite Transylvanian) Tudor Oprea. Here's a photo of Tudor and me at an OpenEye meeting (I think CUP II in 2001) during which our props (Tudor is wearing a PoD cape) were provided by the session chair (the formidable Janet Newman who intimidates proteins to the extent that they 'voluntarily' crystallize).

Now you might be wondering what MMPA has to do with Hammett. The short answer is that our book chapter included a table of what are effectively substituent constants for aqueous solubility and these have Hammett's fingerprints all over them. The longer answer is that Hammett introduced the idea of associating parameters with structural relationships (e.g. X is chloro analog of Y) between compounds. This is an important idea because much pharmaceutical design is focused on understanding and predicting the effects of structural modifications on the activity and properties of compounds. One rationale for this focus is the belief that it is easier to predict differences (e.g. relative affinity) in chemical behavior between structurally-related compounds than it is to predict chemical behavior directly from molecular structure.

At first, I didn't see the deeper connection between Hammett's work and pharmaceutical design. The main focus of our book chapter was preparing chemical structures in databases for virtual screening so the full extent of Hammett's influence on MMPA was not immediately recognized. As is often the case, we think we've discovered something really new only to find out later that somebody had been thinking along similar lines many years before.

Happy 124th birthday, Louis Hammett.

Friday, 6 March 2015

Free energy perturbation in lead optimization

Free energy simulation methods such as free energy perturbation (FEP) have been around for a while and, back in the late eighties when my Pharma career started, they were being touted for affinity prediction in drug discovery. The methods never really caught on in the pharma/biotech industry and there are a number of reasons why this may have been the case including the compute-intensive nature of the calculations and the level of expertise required to run them. This is not to say that nobody in pharma/biotech was using the methods. It’s just that the capability was not widely-perceived to give those who had it a clear advantage over their competitors. Also there are other ways to use protein structural information in lead optimization and I’ve already written about the importance of forming molecular interactions with optimal binding geometry but without incurring conformational/steric energy penalties. Nevertheless, being able to predict affinity accurately would be high on every drug discovery scientist’s wish list.

A recently published study appears to represent a significant step forward and I decided to take a closer after seeing it Pipelined and reviewed. The focus of the study is FEP and a number of innovations are described including an improved force field, enhanced sampling and automated work flow. The quantity calculated in FEP is ΔΔG° which is a measure of relative binding affinity and this is typically what you want to predict in lead optimization. We say ΔΔG° because it’s the difference between two ΔG° values which might, for example, be a compound with an unsubstituted phenyl ring and the corresponding compound with a chloro substituent at C3 of that aromatic ring. When we focus on ΔΔG we are effectively assuming that it is easier to predict differences in affinity than it is to predict affinity itself from molecular structure and this is a theme that I've touched on in a previous post. Readers familiar with matched molecular pair analysis (MMPA 1 | 2 | 3 | 4 | 5 ) will see a parallel with FEP which I failed draw when first writing about MMPA although the point has been articulated in subsequent publications (1 | 2). Of course FEP has been around a lot longer than MMPA so it’s actually much more appropriate to describe the latter as the data-analytic analog of the former.

As with MMPA, the rationale is that it is easier to predict differences in the values of a quantity than it is to predict values of the quantity directly from molecular structure. The authors state:

“In drug discovery lead optimization applications, the calculation of relative binding affinities (i.e., the relative difference in binding energy between two compounds) is generally the quantity of interest and is thought to afford significant reduction in computational effort as compared to absolute binding free energy calculations”

This study does appear to represent the state of the art although I would like to have seen the equivalent of Figure 3 (plot of FEP-predicted ΔG° versus experimental ΔG°) for the free energy differences which are the quantities that are actually calculated. I would argue that Figure 3 is somewhat misleading because some of the variation in FEP-predicted ΔG° is explained by variation in the reference ΔG° values. That said, the relevant information is summarized in Table S2 of the supporting information and the error distribution for the relative binding free energies (ΔΔG°) is shown in Figure S1.

One perception of FEP is that it becomes more difficult to get good results if the perturbation is large and the authors note:

“We find that our methodology is robust up to perturbations of approximately 10 heavy atoms”

Counting atoms is not the only way to gauge the magnitude of a perturbations. It’d also be interested to see how robustly the methodology handles perturbations that involve changes in ionization state and whether ΔΔG°values of greater magnitude are more difficult to predict than those of smaller magnitude. Prediction of affinity for compounds that bind covalently, but reversibly, to targets like cysteine proteases would probably also be feasible using these methods. Something I've wondered about for a few years is what would happen if the aromatic nitrogen that frequently accepts a hydrogen bond from the tyrosine kinase hinge was mutated into an aromatic carbon. If the resulting loss of affinity for this structural transformation was as small as some seem to believe it ought to be then it would certainly open up some 'patent space' in what is currently a bit of a log jam. You can also see how FEP might be integrated with MMPA in a lead optimization setting by using the former to predict the effects of structural modifications on affinity and the latter to assess the likely impact of of these modifications on ADME characteristics like solubility, permeability and metabolic stability.

So lots of possibilities and this is probably a good place to leave it for now.

Thursday, 19 February 2015

There's more to molecular design than making predictions

So it arrived as a call to beware of von Neumann’s elephants and the session has already been most ably covered by The CuriousWavefunction (although I can’t resist suggesting that Ash's elephant may look more like a cyclic voltammogram than the pachyderm of the title). The session to which I refer was Derek Lowe’s much anticipated presentation to a ravening horde of computational chemists that was organized by Boston Area Group Modeling and Informatics (BAGIM). Derek Pipelined beforehand and I provided some helpful tips (comment 1) on how to keep the pack at bay. I have to admit that I meant to say that that many continuum solvent models are charge-type symmetric (as opposed to asymmetric) and should point out that it is actually reality (e.g. DMSO) that is asymmetric with respect to charge type. At least no damage appears to have been done. As an aside, I was gratified by how enthusiastically the bait was taken (see comments 8 and 9) but that’s getting away from the aim of this post which is to explore the relationship between medicinal chemist and computational chemist.

Many in the drug discovery business seem to be of the opinion that the primary function of the pharmaceutical computational chemist is predict activity (enzyme inhibition; hERG blockade) and properties (e.g. solubility; plasma protein binding) of compounds before they are synthesized. I agree that the goal of both computational chemists and medicinal chemists is identification of efficacious and safe compounds and accurate prediction would be of great value in enabling the objective to be achieved efficiently. However, useful prediction of the (in vivo) effects of an arbitrary compound directly from molecular structure is not something that is currently feasible nor does it look like it will become feasible for a long time. Tension is likely to develop between the computational chemist and the medicinal chemist when either or both believe that the primary function of the former is simply to build or use predictive (e.g. QSAR) models for activity and ADME behavior.

One way of looking at Drug Discovery is as a process of accumulating knowledge and perhaps the success of the drug discovery project team should be measured by how efficiently they acquire that knowledge. A project team that quickly and unequivocally invalidates a target has made a valuable contribution because the project can be put out of its misery and the team members can move onto projects with greater potential. Getting smarter (and more honest) about shutting down projects is one area in which the pharma/biotech industry can improve. Although there are a number of ways (e.g. virtual/directed screening; analysis of screening output) that the computational chemist can contribute during hit identification, I’d like to focus on how computational and medicinal chemists can work together once starting points for synthesis have been identified (i.e. during hit-to-lead and lead optimization phases of project).

In drug discovery projects we typically we accumulate knowledge by posing hypotheses (3-chloro will increase potency or a 2-methoxy will lock the molecule into its bound conformation) and that’s why I use the term hypothesis-driven molecular design (HDMD) that I wrote about in an earlier post. When protein structural information is available, the hypothesis often takes the form that making a new interaction will lead to an increase in affinity and the key is finding ways of forming interactions with optimal binding geometry without incurring conformational/steric energy penalties. The computational chemist is better placed than the medicinal chemist to assess interactions by these criteria while the medicinal chemist is better placed to assess the synthetic implications of forming the new interactions. However, either or both may have generated the ideas that the computational chemist assessed and many medicinal chemists have strong grasp of molecular interactions and conformational preferences even if they are unfamiliar with molecular modelling software used to assess these. I always encourage medicinal chemists to learn to use the Cambridge Structural Database (CSD) and because it provides an easy way to become familiar with 3D structures and conformations of molecules as well as providing insight into the interactions that molecules make with one another. It also uses experimental data so you don’t need to worry about force fields and electrostatic models. Here’s a post in which I used the CSD to gain some chemical insight that will give you a better idea of what I’m getting at.

One question posed in the Curious Wavefunction post was whether medicinal chemists should make dead compounds to test a model. My answer to the question is that compounds should be prioritized for synthesis on the basis of the how informative they are likely to be or how active they are likely to be and, as pointed out by Curious Wavefunction, synthetic accessibility always needs to be accounted for. In hit-to-lead or early lead optimization, I’d certainly consider synthetic targets that were likely to be less active than the compounds that had already been synthesized but these would need have potential to provide information. You might ask how should we assess the potential of a compound to provide information and my response would be that it is not, in general, easy but this is what hypothesis-driven molecular design is all about. The further you get into lead optimization, the less likely it becomes that an inactive compound will be informative.

I realize that talking about hypothesis-driven molecular design and potential of synthetic targets to provide information may seem a bit abstract so I’ll finish this post with something a bit more practical. This goes back over a decade to a glycogen phosphorylase inhibitor project for diabetes (see articles 1 | 2 | 3 | 4 | 5 | 6 ). While lead discovery tends to be a relatively generic activity, the challenges in lead optimization tend to be more project-specific. We were optimizing some carboxylic acids (exemplified by the molecular structure in the figure below) and were interested in reducing the fraction bound to plasma protein which is often an issue for compounds that are predominantly anionic under physiological conditions.

I should point out that it wasn’t clear that reducing the bound fraction would have increased the free concentration (this point is discussed in more depth here) but hypothesis-driven design is more about asking good questions than making predictions. We wanted to reduce the unbound fraction (F_u) but we also wanted to keep the compounds anionic which suggested replacing the carboxylic acid with a bioisostere (see 1 | 2 | 3 | 4 | 5 ) such as tetrazole. If you happen to be especially interested in the basis for the bioisosteric relationship between these molecular recognition elements, have a look at Fig 7 in this article but I’ll be focusing on finding out what effect the bioisosteric replacement will have on F_u. Tetrazoles are less lipophilic (logP values are 0.3 to 0.4 units lower) than the corresponding carboxylic acids so we might expect that replacing a carboxylic acid with tetrazole will result in an increase in F_u (this thinking is overly simplistic although saying why would take the blog post off on a tangent but I'm happy to pick this up in comments). We did what has become known as matched molecular pair analysis (MMPA; see 1 | 2 | 3 | 4 | 5 | 6 ) and searched the in house plasma protein binding (PPB) database for pairs of compounds in which carboxylic acid was replaced by tetrazole while keeping the remainder of the molecular structure constant. We do MMPA because we believe that it is easier to predict differences in the values of properties or activity than it is to predict the values themselves directly from molecular structure. Computational chemists may recognize MMPA as a data analytic equivalent of free energy perturbation (FEP; see 1 | 2 ).

The original MMPA performed in 2003 suggested that tetrazoles would be more highly protein bound (i.e. lower F_u) than the corresponding carboxylic acids and on that basis we decided not to synthesize tetrazoles. Subsequent analysis of data for a larger number of matched molecular pairs that were available in 2008 arrived at the same conclusion with a higher degree of confidence (look at SE values) but it was the 2003 analysis that was used make the project decisions. The standard deviation (SD) values of 0.23 and 0.20 are also informative because these suggest that the PPB assay is very reproducible even though it was not run in replicate (have a look at this article to see this point discussed in more depth).

You might ask what we would have done if we hadn’t been able to find any matched molecular pairs with which to test our hypothesis. Had we decided to synthesize a tetrazole then it would have been sensible to select a reference carboxylic acid for which F_u was close to the center of the quantifiable range so as to maximize the probability of the matching tetrazole being ‘in-range’.

This is probably a good place to leave things. Even if you don't agree with what I've said here, I hope this blog post has at least got you thinking that there may be more to pharmaceutical computational chemistry than just making predictions.