Molecular Design: HDMD

Showing posts with label HDMD. Show all posts

Thursday, 19 February 2015

There's more to molecular design than making predictions

So it arrived as a call to beware of von Neumann’s elephants and the session has already been most ably covered by The CuriousWavefunction (although I can’t resist suggesting that Ash's elephant may look more like a cyclic voltammogram than the pachyderm of the title). The session to which I refer was Derek Lowe’s much anticipated presentation to a ravening horde of computational chemists that was organized by Boston Area Group Modeling and Informatics (BAGIM). Derek Pipelined beforehand and I provided some helpful tips (comment 1) on how to keep the pack at bay. I have to admit that I meant to say that that many continuum solvent models are charge-type symmetric (as opposed to asymmetric) and should point out that it is actually reality (e.g. DMSO) that is asymmetric with respect to charge type. At least no damage appears to have been done. As an aside, I was gratified by how enthusiastically the bait was taken (see comments 8 and 9) but that’s getting away from the aim of this post which is to explore the relationship between medicinal chemist and computational chemist.

Many in the drug discovery business seem to be of the opinion that the primary function of the pharmaceutical computational chemist is predict activity (enzyme inhibition; hERG blockade) and properties (e.g. solubility; plasma protein binding) of compounds before they are synthesized. I agree that the goal of both computational chemists and medicinal chemists is identification of efficacious and safe compounds and accurate prediction would be of great value in enabling the objective to be achieved efficiently. However, useful prediction of the (in vivo) effects of an arbitrary compound directly from molecular structure is not something that is currently feasible nor does it look like it will become feasible for a long time. Tension is likely to develop between the computational chemist and the medicinal chemist when either or both believe that the primary function of the former is simply to build or use predictive (e.g. QSAR) models for activity and ADME behavior.

One way of looking at Drug Discovery is as a process of accumulating knowledge and perhaps the success of the drug discovery project team should be measured by how efficiently they acquire that knowledge. A project team that quickly and unequivocally invalidates a target has made a valuable contribution because the project can be put out of its misery and the team members can move onto projects with greater potential. Getting smarter (and more honest) about shutting down projects is one area in which the pharma/biotech industry can improve. Although there are a number of ways (e.g. virtual/directed screening; analysis of screening output) that the computational chemist can contribute during hit identification, I’d like to focus on how computational and medicinal chemists can work together once starting points for synthesis have been identified (i.e. during hit-to-lead and lead optimization phases of project).

In drug discovery projects we typically we accumulate knowledge by posing hypotheses (3-chloro will increase potency or a 2-methoxy will lock the molecule into its bound conformation) and that’s why I use the term hypothesis-driven molecular design (HDMD) that I wrote about in an earlier post. When protein structural information is available, the hypothesis often takes the form that making a new interaction will lead to an increase in affinity and the key is finding ways of forming interactions with optimal binding geometry without incurring conformational/steric energy penalties. The computational chemist is better placed than the medicinal chemist to assess interactions by these criteria while the medicinal chemist is better placed to assess the synthetic implications of forming the new interactions. However, either or both may have generated the ideas that the computational chemist assessed and many medicinal chemists have strong grasp of molecular interactions and conformational preferences even if they are unfamiliar with molecular modelling software used to assess these. I always encourage medicinal chemists to learn to use the Cambridge Structural Database (CSD) and because it provides an easy way to become familiar with 3D structures and conformations of molecules as well as providing insight into the interactions that molecules make with one another. It also uses experimental data so you don’t need to worry about force fields and electrostatic models. Here’s a post in which I used the CSD to gain some chemical insight that will give you a better idea of what I’m getting at.

One question posed in the Curious Wavefunction post was whether medicinal chemists should make dead compounds to test a model. My answer to the question is that compounds should be prioritized for synthesis on the basis of the how informative they are likely to be or how active they are likely to be and, as pointed out by Curious Wavefunction, synthetic accessibility always needs to be accounted for. In hit-to-lead or early lead optimization, I’d certainly consider synthetic targets that were likely to be less active than the compounds that had already been synthesized but these would need have potential to provide information. You might ask how should we assess the potential of a compound to provide information and my response would be that it is not, in general, easy but this is what hypothesis-driven molecular design is all about. The further you get into lead optimization, the less likely it becomes that an inactive compound will be informative.

I realize that talking about hypothesis-driven molecular design and potential of synthetic targets to provide information may seem a bit abstract so I’ll finish this post with something a bit more practical. This goes back over a decade to a glycogen phosphorylase inhibitor project for diabetes (see articles 1 | 2 | 3 | 4 | 5 | 6 ). While lead discovery tends to be a relatively generic activity, the challenges in lead optimization tend to be more project-specific. We were optimizing some carboxylic acids (exemplified by the molecular structure in the figure below) and were interested in reducing the fraction bound to plasma protein which is often an issue for compounds that are predominantly anionic under physiological conditions.

I should point out that it wasn’t clear that reducing the bound fraction would have increased the free concentration (this point is discussed in more depth here) but hypothesis-driven design is more about asking good questions than making predictions. We wanted to reduce the unbound fraction (F_u) but we also wanted to keep the compounds anionic which suggested replacing the carboxylic acid with a bioisostere (see 1 | 2 | 3 | 4 | 5 ) such as tetrazole. If you happen to be especially interested in the basis for the bioisosteric relationship between these molecular recognition elements, have a look at Fig 7 in this article but I’ll be focusing on finding out what effect the bioisosteric replacement will have on F_u. Tetrazoles are less lipophilic (logP values are 0.3 to 0.4 units lower) than the corresponding carboxylic acids so we might expect that replacing a carboxylic acid with tetrazole will result in an increase in F_u (this thinking is overly simplistic although saying why would take the blog post off on a tangent but I'm happy to pick this up in comments). We did what has become known as matched molecular pair analysis (MMPA; see 1 | 2 | 3 | 4 | 5 | 6 ) and searched the in house plasma protein binding (PPB) database for pairs of compounds in which carboxylic acid was replaced by tetrazole while keeping the remainder of the molecular structure constant. We do MMPA because we believe that it is easier to predict differences in the values of properties or activity than it is to predict the values themselves directly from molecular structure. Computational chemists may recognize MMPA as a data analytic equivalent of free energy perturbation (FEP; see 1 | 2 ).

The original MMPA performed in 2003 suggested that tetrazoles would be more highly protein bound (i.e. lower F_u) than the corresponding carboxylic acids and on that basis we decided not to synthesize tetrazoles. Subsequent analysis of data for a larger number of matched molecular pairs that were available in 2008 arrived at the same conclusion with a higher degree of confidence (look at SE values) but it was the 2003 analysis that was used make the project decisions. The standard deviation (SD) values of 0.23 and 0.20 are also informative because these suggest that the PPB assay is very reproducible even though it was not run in replicate (have a look at this article to see this point discussed in more depth).

You might ask what we would have done if we hadn’t been able to find any matched molecular pairs with which to test our hypothesis. Had we decided to synthesize a tetrazole then it would have been sensible to select a reference carboxylic acid for which F_u was close to the center of the quantifiable range so as to maximize the probability of the matching tetrazole being ‘in-range’.

This is probably a good place to leave things. Even if you don't agree with what I've said here, I hope this blog post has at least got you thinking that there may be more to pharmaceutical computational chemistry than just making predictions.

Sunday, 11 January 2015

New year, new blog name...

A new year and a new title for the blog which will now just be ‘Molecular Design’. I have a number of reasons for dropping fragment-based drug discovery (FBDD) from the title but first want to say a bit about molecular design because that may make those reasons clearer. Molecular design can be defined as control of the behavior of compounds and materials by manipulation of molecular properties. The use of the word ‘behavior’ in the definition is very deliberate because we design a compound or material to actually do something like bind to the active site of an enzyme or conduct electricity or absorb light of a particular wavelength. A few years ago, I noted that molecular design can be hypothesis-driven or prediction-driven. In making that observation, I was simply articulating something that many would have already been aware of rather than presenting radical new ideas. However, I was also making the point that it is important to articulate our assumptions in molecular design and be brutally honest about what we don’t know.

Hypothesis-driven molecular design (HDMD) can be thought of as a framework in which to establish what, in the interest of generality, I’ll call ‘structure-behavior relationships’ (SBRs) as efficiently as possible. When we use HDMD, we acknowledge that is not generally possible to predict the behavior of compounds directly from molecular structure in the absence of measurements for structurally related compounds. There is an analogy between HDMD and statistical molecular design (SMD) in that both can be seen as ways of obtaining the information required for making predictions even though the underlying philosophies may differ somewhat. The key challenge for both HDMD (and SMD) is identifying the molecular properties that will have the greatest influence on the behavior of compounds and this is challenging because you need to do it without measured data. An in depth understanding of molecular properties (e.g. conformations, ionization, tautomers, redox potential, metal complexation, uv/vis absorption) is important when doing HDMD because this enables you to pose informative hypotheses. In essence, HDMD is about asking good questions with informative compounds and relevant measurements and the key challenge is how to make the approach more systematic and objective. One key molecular property is something that I’ll call ‘interaction potential’ and this is important because the behavior of a compound is determined to a large extent by the interactions of its molecules with the environments (e.g. crystal lattice, buffered aqueous solution) in which they exist.

Since FBDD is being dropped from the blog title, I thought that I’d say a few words about where FBDD fits into the molecular design framework. I see FBDD as essentially a smart way to do structure-based design in that ligands are assembled from proven molecular recognition elements. The ability to characterize weak binding allows some design hypotheses to be tested without having to synthesize new compounds. It’s also worth remembering that the FBDD has its origins in computational chemistry ( MCSS | Ludi | HOOK ) and that an approach to crystallographic mapping of protein surfaces was published before the original SAR by NMR article made its appearance. My own involvement with FBDD began in 1997 and I focused on screening library design right from the start. The screening library design techniques described in blog posts here ( 1 | 2 | 3 | 4 | 5) and a related journal article have actually been around for almost 20 years although I think they still have some relevance to modern FBDD even if they are getting a bit dated. If you’re interested, you can find a version of the SMARTS-based filtering filter software that I was using even before Zeneca became AstraZeneca. It’s called SSProFilter and you can find source code (it was built with the OEChem toolkit) in the supplemental information for our recent article on alkane/water logP. So why drop FBDD from the blog title? For me, molecular design has always been bigger than fragment-based molecular design and my involvement in FBDD projects has been minimal in recent years. FBDD is increasingly becoming mainstream and, in any case, dropping FBDD from the blog title certainly doesn’t prevent me from discussing fragment-based topics or even indulging in some screening library design should the opportunities arise.

As some readers will be aware, I have occasionally criticized some of the ways that things get done in drug discovery and so it’s probably a good idea to say something about the directions in which I think pharmaceutical design needs to head. I wrote a short Perspective for the JCAMD 25th anniversary issue three years ago and this still broadly represents my view where the field should be going. Firstly we need to acknowledge the state of predictive medicinal chemistry and accept that we will continue to need some measured data for the foreseeable future. This means that, right now, we need to think more about how collect the most informative data as efficiently as possible and less about predicting pharmacokinetic profiles directly from molecular structure. Put another way, we need to think about drug discovery in a Design of Experiments framework. Secondly, we need to look at activity and properties in terms of relationships between structures because it’s often easier to predict differences in the values of a property than it is to predict the values themselves. Thirdly, we need to at least consider alternatives to octanol/water for partition coefficient measurement.

There are also directions in which I think we should not be going. Drug discovery scientists will be aware of an ever-expanding body of rules, guidelines and metrics (RGMs) that prompts analogy with The Second Law. Some of this can be traced to the success of the Rule of 5 and many have learned that is a lot easier to discover metrics than it is to discover drugs. If you question a rule, the standard response will be, “it’s not a rule, it’ a guideline” and your counter-response should be, “you’re the one who called it a rule”. Should you challenge the quantitative basis of a metric, which by definition is supposed to measure something, it is likely that you will be told how useful it is. This defense is easily outflanked by devising a slightly different metric and asking whether it would be more or less useful. Another pattern that many drug discovery scientists will have recognized is that, in the RGM business, simplicity trumps relevance.

Let’s talk about guidelines for a bit. Drug discovery guidelines need to be based on observations of reality and that usually means trends in data. If you’re using guidelines, then you need to know the strength of the trend(s) on which the guidelines are based because this tells you how rigidly you should adhere to the guidelines. Conversely, if you’re recommending that people use the guidelines that you’re touting then it’s very naughty to make the relevant trends appear to be stronger than they actually are. There are no winners when data gets cooked and I think this is a good point at which to conclude the post.

So thanks for reading and I’ll try whet your appetite by saying that the next blog post is going to be on PAINS. Happy new year!