So it arrived as a call to beware of von Neumann’s elephants and the session has already been most ably covered by The CuriousWavefunction (although I can’t resist suggesting that Ash's elephant may look
more like a cyclic voltammogram than the pachyderm of the title). The session to which I refer was Derek Lowe’s much anticipated
presentation to a ravening horde of computational chemists that was organized
by Boston Area Group Modeling and Informatics (BAGIM). Derek Pipelined beforehand and I provided
some helpful tips (comment 1) on how to keep the pack at bay. I have to admit
that I meant to say that that many continuum solvent models are charge-type
symmetric (as opposed to asymmetric) and should point out that it is actually
reality (e.g. DMSO) that is asymmetric with respect to charge type. At least no damage appears to have been done. As an aside, I was gratified by how
enthusiastically the bait was taken (see comments 8 and 9) but that’s getting
away from the aim of this post which is to explore the relationship between medicinal chemist and computational chemist.
Many in the drug discovery business seem to be of the
opinion that the primary function of the pharmaceutical computational chemist
is predict activity (enzyme inhibition; hERG blockade) and properties (e.g.
solubility; plasma protein binding) of compounds before they are synthesized. I agree that the goal of both computational
chemists and medicinal chemists is identification of efficacious and safe
compounds and accurate prediction would be of great value in enabling the
objective to be achieved efficiently.
However, useful prediction of the (in vivo) effects of an arbitrary
compound directly from molecular structure is not something that is currently
feasible nor does it look like it will become feasible for a long time. Tension is likely to develop between the
computational chemist and the medicinal chemist when either or both believe
that the primary function of the former is simply to build or use predictive
(e.g. QSAR) models for activity and ADME behavior.
One way of looking at Drug Discovery is as a process of
accumulating knowledge and perhaps the success of the drug discovery project
team should be measured by how efficiently they acquire that knowledge. A project team that quickly and unequivocally
invalidates a target has made a valuable contribution because the project can
be put out of its misery and the team members can move onto projects with greater potential. Getting smarter (and
more honest) about shutting down projects is one area in which the
pharma/biotech industry can improve.
Although there are a number of ways (e.g. virtual/directed screening;
analysis of screening output) that the computational chemist can contribute
during hit identification, I’d like to focus on how computational and medicinal
chemists can work together once starting points for synthesis have been
identified (i.e. during hit-to-lead and lead optimization phases of project).
In drug discovery projects we typically we accumulate
knowledge by posing hypotheses (3-chloro will increase potency or a 2-methoxy
will lock the molecule into its bound conformation) and that’s why I use the
term hypothesis-driven molecular design (HDMD) that I wrote about in an earlier
post. When protein structural
information is available, the hypothesis often takes the form that making a new
interaction will lead to an increase in affinity and the key is finding ways of
forming interactions with optimal binding geometry without incurring
conformational/steric energy penalties.
The computational chemist is better placed than the medicinal chemist to
assess interactions by these criteria while the medicinal chemist is better
placed to assess the synthetic implications of forming the new interactions. However, either or both may have generated
the ideas that the computational chemist assessed and many medicinal chemists
have strong grasp of molecular interactions and conformational preferences even
if they are unfamiliar with molecular modelling software used to assess these. I always encourage medicinal chemists to
learn to use the Cambridge Structural Database (CSD) and because it provides an
easy way to become familiar with 3D structures and conformations of molecules
as well as providing insight into the interactions that molecules make with one
another. It also uses experimental data
so you don’t need to worry about force fields and electrostatic models. Here’s
a post in which I used the CSD to gain some chemical insight that will
give you a better idea of what I’m getting at.
One question posed in the Curious Wavefunction post was
whether medicinal chemists should make dead compounds to test a model. My answer to the question is that compounds
should be prioritized for synthesis on the basis of the how informative they
are likely to be or how active they are likely to be and, as pointed out by Curious Wavefunction, synthetic
accessibility always needs to be accounted for. In hit-to-lead or early lead
optimization, I’d certainly consider synthetic targets that were likely to be
less active than the compounds that had already been synthesized but these
would need have potential to provide information. You might ask how should we assess the
potential of a compound to provide information and my response would be that it
is not, in general, easy but this is what hypothesis-driven molecular design is
all about. The further you get into lead optimization, the less likely it becomes that an inactive compound will be informative.
I realize that talking about hypothesis-driven molecular
design and potential of synthetic targets to provide information may seem a bit
abstract so I’ll finish this post with something a bit more practical. This goes back over a decade to a glycogen phosphorylase inhibitor project for diabetes (see articles 1 | 2
| 3 | 4 | 5 | 6 ). While lead discovery tends to
be a relatively generic activity, the challenges in lead optimization tend to
be more project-specific. We were
optimizing some carboxylic acids (exemplified by the molecular structure in the
figure below) and were interested in reducing the fraction bound to plasma protein
which is often an issue for compounds that are predominantly anionic under
physiological conditions.
I should point out that it wasn’t clear that reducing the
bound fraction would have increased the free concentration (this point is discussed in more depth here) but
hypothesis-driven design is more about asking good questions than making predictions. We wanted to reduce the unbound
fraction (Fu) but we also wanted to keep the compounds anionic which suggested
replacing the carboxylic acid with a bioisostere (see 1 | 2 | 3 | 4 | 5 ) such as tetrazole. If you happen to be especially interested in
the basis for the bioisosteric relationship between these molecular recognition
elements, have a look at Fig 7 in this article but I’ll be focusing on
finding out what effect the bioisosteric replacement will have on Fu. Tetrazoles are less lipophilic (logP values
are 0.3 to 0.4 units lower) than the corresponding carboxylic acids so we might expect that replacing a carboxylic acid with tetrazole will result in an
increase in Fu (this thinking is overly simplistic although saying why would take the blog post off on a
tangent but I'm happy to pick this up in comments). We did what has become known
as matched molecular pair analysis (MMPA; see
1 | 2 | 3 | 4 | 5 | 6 ) and searched the in house plasma protein binding (PPB)
database for pairs of compounds in which carboxylic acid was replaced by
tetrazole while keeping the remainder of the molecular structure constant. We do MMPA because we believe that it is
easier to predict differences in the values of properties or activity than it
is to predict the values themselves directly from molecular structure. Computational chemists may recognize MMPA as
a data analytic equivalent of free energy perturbation (FEP; see 1 | 2 ).
The original MMPA performed in 2003 suggested that tetrazoles
would be more highly protein bound (i.e. lower Fu) than the corresponding
carboxylic acids and on that basis we decided not to synthesize tetrazoles.
Subsequent analysis of data for a larger number of matched molecular pairs that
were available in 2008 arrived at the same conclusion with a higher degree of
confidence (look at SE values) but it was the 2003 analysis that was used make the
project decisions. The standard
deviation (SD) values of 0.23 and 0.20 are also informative because these suggest that the PPB assay is very reproducible even though it was not run in
replicate (have a look at this article to see this point discussed in more
depth).
You might ask what we would have done if we hadn’t been able
to find any matched molecular pairs with which to test our hypothesis. Had we decided to synthesize a tetrazole then
it would have been sensible to select a reference carboxylic acid for which Fu
was close to the center of the quantifiable range so as to maximize the probability
of the matching tetrazole being ‘in-range’.
This is probably a good place to leave things. Even if you don't agree with what I've said here, I hope this blog post has at least got you thinking that there may be more to pharmaceutical
computational chemistry than just making predictions.