I’ll be concluding the series of posts on ligand efficiency
metrics (LEMs) here so it’s a good point at which to remind you that the open
access for the article on which these posts are based is likely to stop on September
14 (tomorrow). At the risk of sounding like one of those tedious twitter bores who thinks
that you’re actually interested in their page load statistics, download now to
avoid disappointment later. In this post, I’ll also be saying something about
the implications of the LEM critique for FBDD and I’ve not said anything
specific about FBDD in this blog for a long time.
LEMs have become so ingrained in the FBDD orthodoxy that to
criticize them could be seen as fundamentally ‘anti-fragment’ and even
heretical. I still certainly believe that
fragment-based approaches represent an effective (and efficient although not in the LEM sense) way to do drug
discovery. At the same time, I think that it will get more difficult, even
risky, to attempt to tout fragment-based approaches primarily on a basis that
fragment hits are typically more ligand-efficient than higher molecular weight starting points for
synthesis. I do
hope that the critique will at least reassure drug discovery scientists that it
really is OK to ask questions (even of publications with eye-wateringly large
numbers of citations) and show that the ‘experts’ are not always right
(sometimes they’re not even wrong).
My view is that the LEM framework gets used as a crutch in FBDD so perhaps this is a good time for us to cast that crutch aside for a moment
and remind ourselves why we were using fragment-based approaches before LEMs
arrived on the scene. Fragment-based
approaches allow us to probe chemical space efficiently and it can be helpful
to think in terms of information gained per compound assayed or per unit of
synthetic effort consumed. Molecular interactions lie at the core of pharmaceutical molecular design and small,
structurally-prototypical probes allow these to be explored quantitatively
while minimizing the confounding effects of multiple protein-ligand
contacts. Using the language of
screening library design, we can say that fragments cover chemical
space more effectively than larger species and can even conjecture that fragments allow that
chemical space to be sampled at a more controllable resolution. Something I’d like you to think about is the
idea of minimal steric footprint which was mentioned in the fragment context in
both LEM (fragment linking) and Correlation Inflation (achieving axial
substitution) critiques. This is also a
good point to remind readers that not all design in drug discovery is about
prediction. For example,
hypothesis-driven molecular design and statistical molecular design can be seen
as frameworks for establishing structure-activity relationships (SARs) as
efficiently as possible.
Despite criticizing the use of LEMs, I believe that we do need
to manage risk factors such as molecular size and lipophilicity when optimizing
lead series. It’s just I don’t think that the currently used LEMs provide a
generally valid framework for doing this.
We often draw straight lines on plots of activity against risk factors
in order to manage the latter. For
example, we might hypothesize that 10 nM potency will be sufficient for in vivo
efficacy and so we could draw the pIC50 = 8 line to identify
the lowest molecular weight compounds above this line. Alternatively we might try to construct line
a line that represents the ‘leading edge’ (most potent compound for particular
value of risk factor) of a plot of pIC50 against ClogP. When we draw these lines, we often make the
implicit assumption that every point on the line is in some way equivalent. For
example we might conjecture that points on the line represent compounds of
equal ‘quality’. We do this when we use
LEMs and assume that compounds above the line are better than those below it.
Let’s take a look at ligand efficiency (LE) in this context
and I’m going to define LE in terms of pIC50 for the purposes of
this discussion. We can think of LE in terms of a continuum of lines that
intersect the activity axis at zero (where pIC50 = 1 M).
At this point, I should stress that if the activities of a selection of
compounds just happen to lie on any one of these lines then I’d be happy to
treat those compounds as equivalent. Now
let’s suppose that you’ve drawn a line that intersects the activity axis at
zero. Now imagine that I draw a line that
intersects the activity axis at 3 zero (where IC50 = 1 mM) and, just to make things interesting,
I’m going to make sure that my line has a different slope to your line. There will be one point where we appear to
agree (just like −40° on
the Celsius and Fahrenheit temperature scales) but everywhere else we disagree.
Who is right? Almost certainly neither of us is right (plenty of lines to choose
from in that continuum) but in any case we simply don’t know because we’ve both
made completely arbitrary decisions with respect to the points on the activity
axis where we’ve chosen to anchor our respective lines. Here’s a figure that will give you a bit more of an idea what I'm talking about.
However, there may just be a way out this sorry mess and the
first thing that has to go is that arbitrary assumption that all IC50
values tend to 1 M in the limit of zero molecular size. Arbitrary assumptions beget arbitrary
decisions regardless of how many grinning LeanSixSigma Master Black Belts your
organization employs. One way out of the
mire is link efficiency with the response (slope) and agree that points lying
on any straight line (of finite non-zero slope) when activity is plotted
against risk factor represent compounds of equal efficiency. Right now this is only the case when that
line just happens to intersect the activity axis at a point where IC50 = 1 M.
This is essentially what Mike Schultz was getting at in his series of ( 1 | 2 |
3 ) critiques of LE even if he
did get into a bit of a tangle by launching his blitzkrieg on a mathematical
validity front.
The next problem that we’ll need to deal with if we want to
rescue LE is deciding where we make our lines intersect the activity axis. When we calculate LE for a compound, we
connect the point on the activity versus molecular size representing the
compound to a point on the activity axis with a straight line. Then we calculate the slope of this line to
get LE but we still need to find an objective way to select the point on the
activity axis if we are to save LE. One
way to do this is to use the available activity data to locate the point for
you by fitting a straight line to the data although this won’t work if the
correlation between risk factor and activity is very weak. If you can’t use the data to locate the
intercept then how confident do you feel about selecting an appropriate
intercept yourself? As an exercise, you
might like to take a look at Figure 1 in this article (which was reviewed earlier this year at Practical Fragments) and ask if the data set would have allowed you to locate the intercept
used in the analysis.
If you buy into the idea of using the data to locate the
intercept then you’ll need to be thinking about whether it is valid to mix
results from different assays. It may
be the same intercept is appropriate to all potency and affinity assays but the
validity of this assumption needs to be tested by analyzing real data before
you go basing decisions on it. If you get
essentially the same intercept when you fit activity to molecular size for
results from a number of different assays then you can justify aggregating the data.
However, it is important to remember (as is the case with any data analysis
procedure) that the burden of proof is on the person aggregating the data to
show that it is actually valid to do so.
By now hopefully you’ve seen the connection between this
approach to repairing LE and the idea of using the residuals to measure the
extent to which the activity of a compound beats the trend in the data. In each case, we start by fitting a straight
line to the data without constraining either the slope or the intercept. In one case we first subtract the value of
this intercept from pIC50 before calculating LE from the difference in
the normal manner and the resulting metric can be regarded as measuring the
extent to which activity beats the trend in the data. The residuals come directly from the process
and there is no need to create new variables, such as the difference between
from pIC50 and the intercept, prior to analysis. The residuals have
sign which means that you can easily see whether or not the activity of
compound beats the trend in the data. With residuals there is no scaling of
uncertainty in assay measurement by molecular size (as is the case with
LE). Finally, residuals can still be
used if the trend in the data is non-linear (you just need to fit a curve
instead of a straight line). I have argued
the case for using residuals to quantify extent to which activity beats the trend
in the data but you can also generate a modified LEM from your fit of
the activity to molecular size (or whatever property by which you think activity should be scaled by).
It’s been a longer-winded post than I’d intended to write
and this is a good point at which to wrap up.
I could write the analogous post about LipE by substituting ‘slope’ for
‘intercept’ but I won’t because that would be tedious for all of us. I have argued that if you honestly want to normalize activity by risk factor then you should be using trends actually observed in the data rather than making assumptions that self-appointed 'experts' tell you to make. This means thinking (hopefully that's not too much to ask although sometimes I fear that it is) about what questions you'd like to ask and analyzing the data in a manner that is relevant to those questions.
That said, I rest my case.
3 comments:
As always an entertaining and thorough analysis. However, I would like to raise a few questions/challenges.
First, as you have frequently suggested using residuals, it might be instructive for you to present some examples of how to do this using actual (as opposed to model) data. Does using residuals actually yield superior results in the real world?
This leads to my next point, which is that researchers (perhaps especially computational chemists!) often treat experimental data with undeserved deference. As evidence, one could argue that the two lines in your figure are actually quite similar, and that the data available in many projects would not allow one to distinguish between the two. This is especially true where initial fragments may have low affinities and high error bars. Both lines show the same overall trend, and in many cases you don't need much greater accuracy than this.
This leads to my final point, which is that metrics such as ligand efficiency are typically most heavily used at the earliest stages of a program, before one has acquired enough data to use more sophisticated analyses (such as residuals). If you are trying to rapidly triage several hundred fragment hits on a new project, is ligand efficiency really such a bad place to start?
Part 1
“As always an entertaining and thorough analysis. However, I would like to raise a few questions/challenges.”
Thanks, Dan, drug discovery is really tough but I think that we can be serious without being solemn. I’ve tried to address your questions although we’ll probably need to go through a couple iterations to nail this properly so please let me know what needs expansion or clarification.
“First, as you have frequently suggested using residuals, it might be instructive for you to present some examples of how to do this using actual (as opposed to model) data. Does using residuals actually yield superior results in the real world?”
My first response to this would be to counter-challenge and ask if we have a way to objectively show that one set of results is superior to another when using real world data. However, I think we need to go right back to the introduction of LEMs to give your challenge the attention that it deserves. The term ‘normalization’ was used right from the start but nobody actually bothered to say what they meant by term. What I think they mean is that they are trying to account for the ‘contribution’ that a risk factor makes to activity. If this is what you want to do then (I have argued that) you need to model the data. LEMs are often interpreted as ‘bang-for-buck’ and to quantify this we need to have an objective mechanism for setting price. Right now I have little or no real world data at my disposal but at the same time I have no idea how I would compare performance of LEMs and residuals in an objective manner even if I did. If anybody does have ideas about how to do this, I’d be pleased to hear about them. I do have an example that I pulled from the Astex group efficiency article in one of my slide share presentations ( http://www.slideshare.net/pwkenny/data-analytic ) and it is possible that this will be of interest )
Part 2 (in response to Dan)
“This leads to my next point, which is that researchers (perhaps especially computational chemists!) often treat experimental data with undeserved deference. As evidence, one could argue that the two lines in your figure are actually quite similar, and that the data available in many projects would not allow one to distinguish between the two. This is especially true where initial fragments may have low affinities and high error bars. Both lines show the same overall trend, and in many cases you don't need much greater accuracy than this.”
Although always wary of experimental data, I am even more wary of predictions. I tend to worry more about the systematic error (e.g. assay interference; concentration of DMSO stock solutions) than the random error which is easier to get a handle on. I wouldn’t worry too much about the specific lines that I drew because I can construct other examples for which the differences will be larger. I’d constructed the graphic as a sort of reductio ad absurdum exercise and the point was more about the implications of the lines crossing. However, please let me know if you’d like to discuss this point in more detail.
“This leads to my final point, which is that metrics such as ligand efficiency are typically most heavily used at the earliest stages of a program, before one has acquired enough data to use more sophisticated analyses (such as residuals). If you are trying to rapidly triage several hundred fragment hits on a new project, is ligand efficiency really such a bad place to start?”
Some would argue that LEMs are (or should be) used at all stages of projects although don’t think we need to get into that particular debate that to address your point about triaging hits. Using residuals is not sophisticated analysis. Software like JMP (and I think SpotFire as well) can update the data table with the residuals as soon as we fit activity to risk factor. The key point is that we are using the trend actually observed in the data to normalize it when we use residuals although we could also use the intercept to re-define LE as described in the blog post. If I were triaging hits, I would still attempt to fit a straight line to the plot of activity against molecular size one of the key messages in the LEM critique was to model the data with open minds rather than distorting the analysis by making arbitrary assumptions. In the triage situation we often find that ranges in activity and/or molecular size are limited and, with uncertainty in activity, this limits the strength of correlations that we can observe in the data (and by implication the precision in our estimate of the intercept). However, in triaging hits we are doing more than just trying to score them. For example, we might try to identify structural series or perform neighborhood analysis.
Post a Comment