Saturday, 13 September 2014

Saving Ligand Efficiency?

<< Previous |

I’ll be concluding the series of posts on ligand efficiency metrics (LEMs) here so it’s a good point at which to remind you that the open access for the article on which these posts are based is likely to stop on September 14 (tomorrow). At the risk of sounding like one of those tedious twitter bores who thinks that you’re actually interested in their page load statistics, download now to avoid disappointment later. In this post, I’ll also be saying something about the implications of the LEM critique for FBDD and I’ve not said anything specific about FBDD in this blog for a long time.

LEMs have become so ingrained in the FBDD orthodoxy that to criticize them could be seen as fundamentally ‘anti-fragment’ and even heretical.  I still certainly believe that fragment-based approaches represent an effective (and efficient although not in the LEM sense) way to do drug discovery. At the same time, I think that it will get more difficult, even risky, to attempt to tout fragment-based approaches primarily on a basis that fragment hits are typically more ligand-efficient than higher molecular weight starting points for synthesis.  I do hope that the critique will at least reassure drug discovery scientists that it really is OK to ask questions (even of publications with eye-wateringly large numbers of citations) and show that the ‘experts’ are not always right (sometimes they’re not even wrong).

My view is that the LEM framework gets used as a crutch in FBDD so perhaps this is a good time for us to cast that crutch aside for a moment and remind ourselves why we were using fragment-based approaches before LEMs arrived on the scene.  Fragment-based approaches allow us to probe chemical space efficiently and it can be helpful to think in terms of information gained per compound assayed or per unit of synthetic effort consumed.  Molecular interactions lie at the core of pharmaceutical molecular design and small, structurally-prototypical probes allow these to be explored quantitatively while minimizing the confounding effects of multiple protein-ligand contacts.  Using the language of screening library design, we can say that fragments cover chemical space more effectively than larger species and can even conjecture that fragments allow that chemical space to be sampled at a more controllable resolution.  Something I’d like you to think about is the idea of minimal steric footprint which was mentioned in the fragment context in both LEM (fragment linking) and Correlation Inflation (achieving axial substitution) critiques.  This is also a good point to remind readers that not all design in drug discovery is about prediction.  For example, hypothesis-driven molecular design and statistical molecular design can be seen as frameworks for establishing structure-activity relationships (SARs) as efficiently as possible.

Despite criticizing the use of LEMs, I believe that we do need to manage risk factors such as molecular size and lipophilicity when optimizing lead series. It’s just I don’t think that the currently used LEMs provide a generally valid framework for doing this.  We often draw straight lines on plots of activity against risk factors in order to manage the latter.  For example, we might hypothesize that 10 nM potency will be sufficient for in vivo efficacy and so we could draw the pIC50 = 8 line to identify the lowest molecular weight compounds above this line.   Alternatively we might try to construct line a line that represents the ‘leading edge’ (most potent compound for particular value of risk factor) of a plot of pIC50 against ClogP.  When we draw these lines, we often make the implicit assumption that every point on the line is in some way equivalent. For example we might conjecture that points on the line represent compounds of equal ‘quality’.  We do this when we use LEMs and assume that compounds above the line are better than those below it.

Let’s take a look at ligand efficiency (LE) in this context and I’m going to define LE in terms of pIC50 for the purposes of this discussion. We can think of LE in terms of a continuum of lines that intersect the activity axis at zero (where pIC50  = 1 M).  At this point, I should stress that if the activities of a selection of compounds just happen to lie on any one of these lines then I’d be happy to treat those compounds as equivalent.  Now let’s suppose that you’ve drawn a line that intersects the activity axis at zero.  Now imagine that I draw a line that intersects the activity axis at 3 zero (where IC50  = 1 mM) and, just to make things interesting, I’m going to make sure that my line has a different slope to your line.  There will be one point where we appear to agree (just like −40° on the Celsius and Fahrenheit temperature scales) but everywhere else we disagree. Who is right? Almost certainly neither of us is right (plenty of lines to choose from in that continuum) but in any case we simply don’t know because we’ve both made completely arbitrary decisions with respect to the points on the activity axis where we’ve chosen to anchor our respective lines.  Here’s a figure that will give you a bit more of an idea what I'm talking about.


However, there may just be a way out this sorry mess and the first thing that has to go is that arbitrary assumption that all IC50 values tend to 1 M in the limit of zero molecular size.  Arbitrary assumptions beget arbitrary decisions regardless of how many grinning LeanSixSigma Master Black Belts your organization employs.  One way out of the mire is link efficiency with the response (slope) and agree that points lying on any straight line (of finite non-zero slope) when activity is plotted against risk factor represent compounds of equal efficiency.  Right now this is only the case when that line just happens to intersect the activity axis at a point where IC50  = 1 M.  This is essentially what Mike Schultz was getting at in his series of (  1  | 2  |  3  ) critiques of LE even if he did get into a bit of a tangle by launching his blitzkrieg on a mathematical validity front. 
The next problem that we’ll need to deal with if we want to rescue LE is deciding where we make our lines intersect the activity axis.  When we calculate LE for a compound, we connect the point on the activity versus molecular size representing the compound to a point on the activity axis with a straight line.  Then we calculate the slope of this line to get LE but we still need to find an objective way to select the point on the activity axis if we are to save LE.  One way to do this is to use the available activity data to locate the point for you by fitting a straight line to the data although this won’t work if the correlation between risk factor and activity is very weak.  If you can’t use the data to locate the intercept then how confident do you feel about selecting an appropriate intercept yourself?  As an exercise, you might like to take a look at Figure 1 in this article (which was reviewed earlier this year at Practical Fragments) and ask if the data set would have allowed you to locate the intercept used in the analysis.

If you buy into the idea of using the data to locate the intercept then you’ll need to be thinking about whether it is valid to mix results from different assays.   It may be the same intercept is appropriate to all potency and affinity assays but the validity of this assumption needs to be tested by analyzing real data before you go basing decisions on it.  If you get essentially the same intercept when you fit activity to molecular size for results from a number of different assays then you can justify aggregating the data. However, it is important to remember (as is the case with any data analysis procedure) that the burden of proof is on the person aggregating the data to show that it is actually valid to do so. 

By now hopefully you’ve seen the connection between this approach to repairing LE and the idea of using the residuals to measure the extent to which the activity of a compound beats the trend in the data.  In each case, we start by fitting a straight line to the data without constraining either the slope or the intercept.  In one case we first subtract the value of this intercept from pIC50 before calculating LE from the difference in the normal manner and the resulting metric can be regarded as measuring the extent to which activity beats the trend in the data.  The residuals come directly from the process and there is no need to create new variables, such as the difference between from pIC50 and the intercept, prior to analysis. The residuals have sign which means that you can easily see whether or not the activity of compound beats the trend in the data.  With residuals there is no scaling of uncertainty in assay measurement by molecular size (as is the case with LE).  Finally, residuals can still be used if the trend in the data is non-linear (you just need to fit a curve instead of a straight line).  I have argued the case for using residuals to quantify extent to which activity beats the trend in the data but you can also generate a modified LEM from your fit of the activity to molecular size (or whatever property by which you think activity should be scaled by).

It’s been a longer-winded post than I’d intended to write and this is a good point at which to wrap up.  I could write the analogous post about LipE by substituting ‘slope’ for ‘intercept’ but I won’t because that would be tedious for all of us.  I have argued that if you honestly want to normalize activity by risk factor then you should be using trends actually observed in the data rather than making assumptions that self-appointed 'experts' tell you to make.   This means thinking (hopefully that's not too much to ask although sometimes I fear that it is) about what questions you'd like to ask and analyzing the data in a manner that is relevant to those questions. 

That said, I rest my case.

Thursday, 4 September 2014

Efficiency can also be lipophilic

<< Previous || Next >>
In the previous post, I questioned the validity of scaled ligand efficiency metrics (LEMs) such as LE.  However, LEMs can also be defined by subtracting the value of the risk factor from activity and this has been termed offsetting.  For example, you can subtract a measure of lipophilicity from pIC50 to give functions such as:

pIC50 – logP

pIC50 – ClogP

pIC50 – logD(pH)
As you will have gathered from the previous post, I am not a big fan of naming LEMs. The reason for this is that you often (usually?) can’t tell from the definition exactly what has been calculated and I think it would be a lot better if people were forced (journal editors, you can achieve something of lasting value here) to be explicit about the mathematical function(s) with which they normalize activity of compounds.  In some ways, the problems are actually worse when activity is normalized by lipophilicity because a number of different measures of lipophilicity can be used and because differences between lipophilicity measures are not always well understood (even by LEM ‘experts’).  Does LLE mean ‘ligand-lipophilicity efficiency’ or 'lipophilic ligand efficiency’?  When informed that a compound has an LLE of 4, how can I tell whether it has been calculated using logD (please specify pH), logP or predicted logP (please specify prediction method since there several to choose from)?

Have a look at this figure and observe how the three parallel lines respond the offsetting transformation (Y => Y-X).  The line with unit slope transforms to a line of zero slope (Y-X is independent of X) while the other two lines transform to lines of non-unit slope (Y/X is dependent on X).  This figure is analogous to the one in the previous post that showed how three parallel lines transformed under the scaling transformation. 
It’s going to be helpful to generalize Lipophilic Efficiency (LipE) so let’s do that first:
LipEgen = pIC50  - (l ´ ClogP)

Generalizing LipE in this manner shows us that LipE (l = 1) is actually quite arbitrary (in much the same way that a standard or reference state is arbitrary and one might ask whether l = 0.5 might not be a better LEM.  Note that a similar criticism was made of Solubility Forecast Index in the Correlation Inflation Perspective. One approach to validating an LEM would be to show that it actually predicted relevant behavior of compounds. In the case of LEMs based on lipophilicity, it would be necessary to show that the best predictions were observed for l = 1.  Although one can think of an LEMs as simple quantitative structure activity relationships (QSARs), LEMs are rarely, if ever, validated in a way that QSAR practitioners would regard as valid.  Can anybody find a sentence in the pharmaceutical literature containing the words ‘ligand’, ‘efficiency’ and ‘validated’?  Answers on a postcard...
Offset LEMs do differ from scaled LEMs and one might invoke a thermodynamic argument to justify the use of LipE as an LEM.  In a nutshell it can be argued that LipE is a measure of the ease of moving a compound from a non-polar environment to its binding site in the protein.  There are two flaws in this argument which were discussed in our LEM critique which will be open access for another 10 days or so.  Firstly, when ligand binds in an ionized form, lipophilicity measures do not quantify the ease of moving the bound form from octanol to water because ionized forms of compounds do not usually partition into octanol to a significant extent. Secondly, octanol/water is just one of a number of partitioning systems and one needs to demonstrate that lipophilicity derived from it is optimal for definition of an LEM.  The figure below shows how logP can differ when measured in alternative partitioning systems and you should be aware of an occasionally expressed misconception that the relevant logP values simply differ by a constant amount.



One solution to the problem is to model pIC50 as a function of your favored measure of lipophilicity and use the residuals to quantify the extent to which activity beats the trend in the data.  This is what exactly what I suggested  in the previous post as an alternative to scaling activity by risk factors such as molecular weight or heavy atoms and the approach can be seen as bringing these risk factors and lipophilicity into a common data-analytical framework.   Even if you don’t like the idea of using the residuals, it is still useful to model the measured activity because a slope of unity helps to validate LipE (assuming that you’re using ClogP to model activity).   Even if the slope of the line of fit differs from unity, you can set  l to its value to create a lipophilic efficiency metric that has been tuned to the data set that you wish to analyze.

This is a good point at which to wrap up.  As noted (and reiterated) in the LEM critique, when you use LEMs, you're making assumptions about trends in data and your perception of the system is distorted when these assumptions break down.  Modelling the data by fitting activity to risk factor allows you use the trends actually observed in the data to normalize activity.  That’s just about all I want to say for now and please don’t get me started on LELP

Saturday, 30 August 2014

Ligand efficiency metrics considered harmful

Next >>
It has been a while since I did a proper blog post. Some of you may have encountered a Perspective article entitled, ‘Ligand efficiency metrics considered harmful’ and I’ll post on it because the journal has made the article open access until Sept 14.  The Perspective has already been reviewed by Practical Fragments and highlighted in a F1000 review. Some of the points discussed in the article were actually raised last year in Efficient Voodoo Thermodynamics, Wrong Kind of Free Energy and ClogPalk : a method for predicting alkane/water partition coefficients.  There has been recent debate about the validity of ligand efficiency (LE) which is summarized in a blog post (make sure to look at the comments as well).  However, I believe that both sides missed the essential point which is that the choice (conventionally 1 M) of concentration that is used to define the standard state is entirely arbitrary.  

In this blog post, I’ll focus on what I call ‘scaled’ ligand efficiency metrics (LEMs). Scaling means that a measure of activity or affinity is divided by a risk factor such as molecular weight (MW) or heavy (i.e. non-hydrogen) atoms (HA).   For example, LE can be calculated by dividing the standard free energy of binding (ΔG°) by HA:
LE  =  (1/HA)´RTloge(Kd/C°)

Now you’ll notice that I’ve written ΔG° in terms of the dissociation constant (Kd) and the standard concentration (C°) and I articulated why this is important early last year.  The logarithm function is only defined for numbers (i.e. dimensionless quantities) and, inconveniently, Kd has units of concentration.  This means that LE is a function of both Kd and C° and I’m going to first redefine LE a bit to make the problem a bit easier to see.   I'll use IC50 instead of Kd to define a new LEM which I’m not going to name (in the LEM literature names and definitions keep changing so much better to simply state the relevant formula for whatever is actually used):

(−1/NHA)´log10(IC50/Cref)

I have a number of reasons for defining the metric in this manner and the most important of these is that the new metric is metric is dimensionless.  Note how I use the number of heavy atoms (NHA) rather than HA to define the metric.  Just in case you’re wondering what the difference is, HA for ethanol is 3 heavy atoms while NHA is 3 (lacking units of heavy atoms). [In the original post I'd counted 2 for this molecule but if you check the comments you'll see that this howler was picked up by an alert reader and I has now been corrected.]  Apologies for being pedantic (some might even call me a units Nazi) but if people had paid more attention to units, we’d never have got into this sorry mess in the first place.  The other point to note is that I’ve not converted IC50 to units of energy, mainly for the reason that it is incorrect to do so because an IC50 is not a thermodynamic quantity. However, there are other reasons for not introducing units of energy. Often units of energy go AWOL when values of LE are presented and there is no way of knowing whether the original units were kcal/mol or kJ/mol. Even when energy units are stated explicitly, this might be masking a situation in which affinity and potency measurements have been combined in a single analysis.  Of course, one can be cynical and suggest that the main reason for introducing energy units is to make biochemical measurements appear to be more physical. 

So let’s get back to that new metric and you’ll have noticed a quantity in the defining equation that I’ve called Cref (reference concentration).  This is similar to the standard concentration in that the choice of its value is completely arbitrary but it is also different because it has no thermodynamic significance. You need to use it when defining the new LEM because, at the risk of appearing repetitive, you can’t calculate a logarithm for a quantity that has units.  Another way of thinking about Cref is as an arbitrary unit of concentration that we’re using to analyze some potency measurements.  Something that is really, really important and fundamental in science is that your perception of a system should not change when you change the units of the quantities that describe the system.  If it does then, in the words of Pauli, you are “not even wrong” and you should avoid claiming penetrating insight so as to prevent embarrassment later. So let’s take a look at how changing Cref affects our perception of ligand efficiency.  The table below is essentially the same as what was given in the Perspective article (the only difference is that I’m using IC50 in the blog post rather than Kd).  I also should point out that Huan-Xiang Zhou and Mike Gilson made a similar criticism of LE back in 2009 (although we cited their article in the context of standard states, we failed to notice the critique at the end of their article and there really is no excuse for having missed it).  When a reference concentration of 1 M is used, the three compounds are all equally ligand efficient according to the new LEM.  If Cref = 0.1 M, the compounds appear to become more ligand efficient as molecular size increases but the opposite behavior is observed for Cref = 10 M.


Here’s a figure that shows the problem from a different angle. See how the three parallel lines respond the scaling transformation (Y=> Y/X).  The line that passes through the origin transforms to a line of zero slope (Y/X is independent of X) while the other two lines transform to curves (Y/X is dependent on X).  This graphic has important implications for fit quality (FQ) because it shows that some of the size dependency of LE is a consequence of the (arbitrary) choice of a value (usually 1 M) for Cref or C°.


A common response that I’ve encountered when raising these points is that LE is still useful.  My counter-response is that Religion is also useful (e.g. as a means for pastors to fleece their flocks efficiently) but that, by itself, neither makes it correct nor ensures that that it is being used correctly (if that is even possible for Religion).  One occasionally expressed opinion is that, provided you use a single value for Cref or C°, the resulting LE values will be consistent.  However, consistency is no guarantee of correctness and we need to remember that we create metrics for the purpose of measuring things with them.  When you advocate use of a metric in drug discovery, the burden of proof is on you to demonstrate that the metric has a sound scientific basis and actually measures what it supposed to measure.
The Perspective is critical of the LEMs used in drug discovery but it does suggest alternatives that do not suffer from the same deficiencies.  This is a good point to admit that it took me a while to figure out what was wrong with LE and I’ll point you towards a blog post from over five years ago that will give you an idea about how my position on LEMs has evolved.  It is often stated that LEMs normalize activity with respect to risk factor although it is rarely, if ever, stated explicitly what is meant by the term ‘normalize’.   One way of thinking about normalization is as a way to account for the contribution of a risk factor, such as molecular size or lipophilicity, to activity.  You can do this by modelling the activity as a function of risk factor and using the residual as a measure of activity that has been corrected for the contribution made by the risk factor in question.  You can also think of an LEM as a measure of the extent to which the activity of a compound beats a trend (e.g. linear response of activity to HA). If you’re going to do this then why not use the trend actually observed in the data rather than some arbitrarily assumed trend?

Although I still prefer to use residuals to quantify extent to which activity beats a trend, the process of modelling activity measurements hints at a way that LE might be rehabilitated to some extent.  When you fit a line to activity (or affinity) measurements, you are effectively determining the value of Cref (or C°) that will make the line of fit pass through the origin and which you can then use to redefine activity (or affinity) for the purpose of calculating LE. I would argue that the residual is a better measure of the extent to which the activity of a compound beats the trend in the data because the residual has sign and the uncertainty in it does not depend explicitly on the value of a scaling variable such as HA. However, LE defined in this manner can at least be claimed to take account of the observed trend in the activity data. Something that you might want to think about in this context is whether or not you'd expect the same parameters (slope and intercept) if you were to fit activity measured against different targets to MW or HA.  My reason for bring this up is that it has implications for the validity of mixing results from different assays in LE-based analyses so see what you think.  


I’ll wrap up by directing you to a presentation that I've been doing lately and it includes material from the earlier correlation inflation study.  A point that I make when presenting this material is that, if we do bad science and bad data analysis, people can be forgiven for thinking that the difficulties in drug discovery may actually be of our own making.

Wednesday, 4 June 2014

Ligand efficiency metric teaser

Here's a bit of fun. See how these parallel lines transform when you divide Y by X (this is how you calculate ligand efficiency).


Here's some more fun. See how these lines transform when you subtract X from Y (this is how you calculate lipophilic efficiency.


Would you consider activities lying on a straight line when plotted against molecular size or lipophilicity to represent compounds with the same efficiency?