Saturday 30 August 2014

Ligand efficiency metrics considered harmful

Next >>
It has been a while since I did a proper blog post. Some of you may have encountered a Perspective article entitled, ‘Ligand efficiency metrics considered harmful’ and I’ll post on it because the journal has made the article open access until Sept 14.  The Perspective has already been reviewed by Practical Fragments and highlighted in a F1000 review. Some of the points discussed in the article were actually raised last year in Efficient Voodoo Thermodynamics, Wrong Kind of Free Energy and ClogPalk : a method for predicting alkane/water partition coefficients.  There has been recent debate about the validity of ligand efficiency (LE) which is summarized in a blog post (make sure to look at the comments as well).  However, I believe that both sides missed the essential point which is that the choice (conventionally 1 M) of concentration that is used to define the standard state is entirely arbitrary.  

In this blog post, I’ll focus on what I call ‘scaled’ ligand efficiency metrics (LEMs). Scaling means that a measure of activity or affinity is divided by a risk factor such as molecular weight (MW) or heavy (i.e. non-hydrogen) atoms (HA).   For example, LE can be calculated by dividing the standard free energy of binding (ΔG°) by HA:
LE  =  (1/HA)´RTloge(Kd/C°)

Now you’ll notice that I’ve written ΔG° in terms of the dissociation constant (Kd) and the standard concentration (C°) and I articulated why this is important early last year.  The logarithm function is only defined for numbers (i.e. dimensionless quantities) and, inconveniently, Kd has units of concentration.  This means that LE is a function of both Kd and C° and I’m going to first redefine LE a bit to make the problem a bit easier to see.   I'll use IC50 instead of Kd to define a new LEM which I’m not going to name (in the LEM literature names and definitions keep changing so much better to simply state the relevant formula for whatever is actually used):


I have a number of reasons for defining the metric in this manner and the most important of these is that the new metric is metric is dimensionless.  Note how I use the number of heavy atoms (NHA) rather than HA to define the metric.  Just in case you’re wondering what the difference is, HA for ethanol is 3 heavy atoms while NHA is 3 (lacking units of heavy atoms). [In the original post I'd counted 2 for this molecule but if you check the comments you'll see that this howler was picked up by an alert reader and I has now been corrected.]  Apologies for being pedantic (some might even call me a units Nazi) but if people had paid more attention to units, we’d never have got into this sorry mess in the first place.  The other point to note is that I’ve not converted IC50 to units of energy, mainly for the reason that it is incorrect to do so because an IC50 is not a thermodynamic quantity. However, there are other reasons for not introducing units of energy. Often units of energy go AWOL when values of LE are presented and there is no way of knowing whether the original units were kcal/mol or kJ/mol. Even when energy units are stated explicitly, this might be masking a situation in which affinity and potency measurements have been combined in a single analysis.  Of course, one can be cynical and suggest that the main reason for introducing energy units is to make biochemical measurements appear to be more physical. 

So let’s get back to that new metric and you’ll have noticed a quantity in the defining equation that I’ve called Cref (reference concentration).  This is similar to the standard concentration in that the choice of its value is completely arbitrary but it is also different because it has no thermodynamic significance. You need to use it when defining the new LEM because, at the risk of appearing repetitive, you can’t calculate a logarithm for a quantity that has units.  Another way of thinking about Cref is as an arbitrary unit of concentration that we’re using to analyze some potency measurements.  Something that is really, really important and fundamental in science is that your perception of a system should not change when you change the units of the quantities that describe the system.  If it does then, in the words of Pauli, you are “not even wrong” and you should avoid claiming penetrating insight so as to prevent embarrassment later. So let’s take a look at how changing Cref affects our perception of ligand efficiency.  The table below is essentially the same as what was given in the Perspective article (the only difference is that I’m using IC50 in the blog post rather than Kd).  I also should point out that Huan-Xiang Zhou and Mike Gilson made a similar criticism of LE back in 2009 (although we cited their article in the context of standard states, we failed to notice the critique at the end of their article and there really is no excuse for having missed it).  When a reference concentration of 1 M is used, the three compounds are all equally ligand efficient according to the new LEM.  If Cref = 0.1 M, the compounds appear to become more ligand efficient as molecular size increases but the opposite behavior is observed for Cref = 10 M.

Here’s a figure that shows the problem from a different angle. See how the three parallel lines respond the scaling transformation (Y=> Y/X).  The line that passes through the origin transforms to a line of zero slope (Y/X is independent of X) while the other two lines transform to curves (Y/X is dependent on X).  This graphic has important implications for fit quality (FQ) because it shows that some of the size dependency of LE is a consequence of the (arbitrary) choice of a value (usually 1 M) for Cref or C°.

A common response that I’ve encountered when raising these points is that LE is still useful.  My counter-response is that Religion is also useful (e.g. as a means for pastors to fleece their flocks efficiently) but that, by itself, neither makes it correct nor ensures that that it is being used correctly (if that is even possible for Religion).  One occasionally expressed opinion is that, provided you use a single value for Cref or C°, the resulting LE values will be consistent.  However, consistency is no guarantee of correctness and we need to remember that we create metrics for the purpose of measuring things with them.  When you advocate use of a metric in drug discovery, the burden of proof is on you to demonstrate that the metric has a sound scientific basis and actually measures what it supposed to measure.
The Perspective is critical of the LEMs used in drug discovery but it does suggest alternatives that do not suffer from the same deficiencies.  This is a good point to admit that it took me a while to figure out what was wrong with LE and I’ll point you towards a blog post from over five years ago that will give you an idea about how my position on LEMs has evolved.  It is often stated that LEMs normalize activity with respect to risk factor although it is rarely, if ever, stated explicitly what is meant by the term ‘normalize’.   One way of thinking about normalization is as a way to account for the contribution of a risk factor, such as molecular size or lipophilicity, to activity.  You can do this by modelling the activity as a function of risk factor and using the residual as a measure of activity that has been corrected for the contribution made by the risk factor in question.  You can also think of an LEM as a measure of the extent to which the activity of a compound beats a trend (e.g. linear response of activity to HA). If you’re going to do this then why not use the trend actually observed in the data rather than some arbitrarily assumed trend?

Although I still prefer to use residuals to quantify extent to which activity beats a trend, the process of modelling activity measurements hints at a way that LE might be rehabilitated to some extent.  When you fit a line to activity (or affinity) measurements, you are effectively determining the value of Cref (or C°) that will make the line of fit pass through the origin and which you can then use to redefine activity (or affinity) for the purpose of calculating LE. I would argue that the residual is a better measure of the extent to which the activity of a compound beats the trend in the data because the residual has sign and the uncertainty in it does not depend explicitly on the value of a scaling variable such as HA. However, LE defined in this manner can at least be claimed to take account of the observed trend in the activity data. Something that you might want to think about in this context is whether or not you'd expect the same parameters (slope and intercept) if you were to fit activity measured against different targets to MW or HA.  My reason for bring this up is that it has implications for the validity of mixing results from different assays in LE-based analyses so see what you think.  

I’ll wrap up by directing you to a presentation that I've been doing lately and it includes material from the earlier correlation inflation study.  A point that I make when presenting this material is that, if we do bad science and bad data analysis, people can be forgiven for thinking that the difficulties in drug discovery may actually be of our own making.


Unknown said...

Hi Peter,
as ever I have enjoyed your thoughtful discussion of metrics. Does a pickelhauben hat help the thought process?
As a fellow fan of ethanol, dint you mean that HA (or NHA) is 3? perhaps you have been at the embalming fluid again?
On a more serious point, isn't Ki (a constant) a better value than IC50? as you know iC50 is dependent on substrate (its Km and concentration), so it may not be appropriate to compare between assays.

Peter Kenny said...

Hi John, Many thanks for flagging up this howler which would be like Ho Chi Minh saying that NHA for formaldehyde is 3 (ignoring for a moment that formaldehyde is predominantly hydrated in aqueous solution). I will fix error and make appropriate reference to your comment.

I would agree that Ki (or Kd) is more useful than IC50 when interpreting results from different assays even though neither Ki nor Kd is necessarily more predictive of effects in cells or in vivo. However, Ki and Kd will respond differently to molecular size for different targets (and even different structural series against the same target). This has implications if one accepts that one should use the observed response to molecular size to normalize activity rather than some arbitrarily assumed response. In any case, I would argue the burden of proof is firmly on those mixing assay results to demonstrate that it is acceptable to do so.

IC50 values can also depend on concentration of protein and this does not always seem to have been appreciated when considering maximal affinity of ligands. Even with 0.1 nM affinity ligand, you still need concentration of (at least) 5 nM of ligand for 50% inhibition of an enzyme that is at 10 nM in assay buffer. This is the tight binding situation and it's important that people proposing explanations for a plateau in activity corresponding to larger molecules are aware that the limits about which they pontificate may reflect assay configuration rather than molecular recognition.