Tuesday, 22 September 2015

Ligand efficiency metrics: why all the fuss?

I guess it had to happen.  Our gentle critique of ligand efficiency metrics (LEMs) has finally triggered a response in the form of a journal article (R2015) in which it is dismissed as "noisy arguments" and I have to admit, as Denis Healey might have observed, that this is like being savaged by a dead sheep.  The R2015 defense of LEMs follows on from an earlier one (M2014) in which the LEM Politburo were particularly well represented and it is instructive to compare the two articles.  The M2014 study was a response to Mike Schultz’s assertion ( 1 | 2 | 3 ) that ligand efficiency (LE) was mathematically invalid and its authors correctly argued that LE was a mathematically valid expression although they did inflict a degree of collateral damage on themselves by using a mathematically invalid formula for LE (their knowledge of the logarithm function did not extend to its inability to take an argument that has units).  In contrast, the R2015 study doesn’t actually address the specific criticisms that were made of LE and can be described crudely as an appeal to the herding instinct of drug discovery practitioners.  It actually generates a fair amount of its own noise by asserting that, "Ligand efficiency validated fragment-based design..." and fails to demonstrate any understanding of the points made in our critique.  The other difference between the two studies is that the LEM Politburo is rather less well represented in the R2015 study and one can only speculate as to whether any were invited to join this Quixotic assault on windmills that can evade lances by the simple (and valid) expedient of changing their units.  One can also speculate about what the responses to such an invitation might have been (got a lot on right now… would love to have helped… but don’t worry…go for it... you’ll be just fine... we’ll be right behind you if it gets ugly) just as we might also speculate as to whether Marshal Bosquet might have summed up the R2015 study as, "C'est magnifique, mais ce n'est pas la guerre".

It’s now time to say something about our critique of LE metrics and I’ll also refer readers to a three-part series ( 1 | 2 | 3 ) of blog posts and the ‘Ligand efficiency: nice concept, shame about the metrics’ presentation.  In a nutshell, the problem is that LE doesn’t actually measure what the LEM Politburo would like you to believe that it measures and it is not unreasonable to ask whether it actually measures anything at all. Specifically, LE provides a view of chemico-biological space that changes with the concentration units in which measures of affinity and inhibitory activity are usually expressed. A view of a system that changes with the units in which the quantities that describe the system are expressed might have prompted Pauli to exclaim that it was "not even wrong" and, given that LE is touted as a metric for decision-making, this is a serious deficiency that is worth making a fuss about.  If you want to base design decisions on such a metric (or planetary alignments for that matter) then by all means do so. I believe the relevant term is 'consenting adults'.

Now you might be thinking that it's a little unwise to express such heretical thoughts since this might cause the auto-da-fé to be kindled and I have to admit that, at times, the R2015 study does read rather like the reaction of the Vatican’s Miracle Validation Department to pointed questions about a canonization decision.  Our original criticism of LE was on thermodynamic grounds and this seemed reasonable given that LE was originally defined in thermodynamic terms. As an aside, anybody thinking of invoking Thermodynamics when proposing a new metric should consider the advice from Tom Lehrer in 'Be Prepared' that goes, "don't write naughty words on wall when you can't spell". Although the R2015 study complains that the thermodynamic arguments used in our critique were complex, the thermodynamic basis of our criticism was both simple and fundamental.  What we showed is that perception of ligand efficiency changes when we change the value of the concentration used to define the standard state for ΔG° and, given that the standard concentration is really just a unit, that is a very serious criticism indeed.  It is actually quite telling that the R2015 study doesn't actually say what the criticism was and to simply dismiss criticism as "noisy arguments" without counter-argument in these situations is to effectively run the white flag up the pole.  I would argue that if there was a compelling counter-argument to our criticism of LE then it's a pretty safe assumption that the R2015 study would have used it.  This is a good time to remind people that if you're going to call something a metric then you're claiming that it measures something and, if somebody calls your bluff, you'll need more than social media 'likes' to back your claims.

All that said, I realize that many in the drug discovery field find Thermodynamics a little scary so I'll try to present the case against LE in non-thermodynamic terms. I suggest that we use  IC50 which is definitely not thermodynamic and I'll define something called generalized ligand efficiency (GLE) to illustrate the problems with LE:

GLE = -(1/NHA´ log10(IC50/Cref )

In defining generalized ligand efficiency, I've dumped the energy units, used the number of heavy atoms (which means that we don't have to include a 'per heavy atom' when we quote values) and changed the basis for the logarithm to something a bit more user-friendly.  Given that LE advocates often tout the simplicity of LE, I just can't figure out why they are so keen on the energy units and natural logarithms especially when they so frequently discard the energy units when they quote values of LE.  One really important point of which I'd like you to take notice is that IC50 has been divided by an arbitrary concentration unit that I'll call  Cref  (for reference concentration). The reason for doing this is that you can't calculate a logarithm for a quantity with units and, when you say "IC50 in molar units", you are actually talking about the ratio of IC50 to a 1 M concentration.  Units are really important in science but they are also arbitrary in the sense that coming to different conclusions using different units is more likely to indicate an error in the 'not even wrong' category rather than penetrating insight. Put another way, if you concluded that Anne-Marie was taller than Béatrice when measured in inches how would you react when told that Béatrice was taller than Anne-Marie when measured in centimetres?  Now before we move on, just take another look at formula for GLE and please remember that Cref  is an arbitrary unit of concentration and that a metric is no more than a fancy measuring tape.

Ligand efficiency calculations almost invariably involve selection of 1 M as a the concentration unit although those using LE (and quite possibly those who introduced the metrics)  are usually unaware that this choice has been made. If you base a metric on a specific unit then you either need to demonstrate that the choice of unit is irrelevant or you need to justify your choice of unit. GLE allows us to to explore the consequences of arbitrarily selecting a concentration unit. As a potential user of a metrics, you should be extremely wary of any metric if those advocating its use evade their responsibility to be open with the assumptions made in defining the metric and the consequences of making those assumptions. Let's take a look at the consequences of tying LE to the 1 M concentration unit and I'll ask you to take a look at the table below which shows GLE values calculated for a fragment hit, a lead and an optimized clinical candidate.  When we calculate  GLE using 1 M as the concentration unit, all three compounds appear to be equally ligand-efficient but that picture changes when we choose another concentration unit.  If we choose 0.1 M as the concentration unit we now discover that the optimized clinical candidate is more ligand efficient than the fragment hit. However, if we choose a different concentration unit (10 M) we find that the fragment hit is now more ligand-efficient than the optimized clinical candidate. All we need to do now is name our fragment hit 'Anne-Marie' and our optimized clinical candidate 'Béatrice' and it will be painfully apparent that something has gone horribly awry with our perception of reality.
One should be very worried about basing decisions on a metric that tells you different things when you express quantities in different units because 10 nM is still 10 nM whether you call it 10000 pM, 10-2 mM or 10-8 M.  This is a good time to ask readers if they can remember LE being used to prioritize a one compound (or even a series) over another in a project situation.  Would the decision have been any different had you expressed of IC50 in different units when you calculated LE?   Was Anne-Marie really more beautiful than Béatrice?   The other point that I'd like you think about is the extent to which improperly-defined metrics can invalidate the concept that they are intended to apply. I do believe that a valid concept of ligand efficiency can be retrieved from the wreckage of metrics and this is actually discussed both in the conclusion of our LEM critique and and my recent presentation. But enough of that because I'd like to let the R2015 say a few words.

The headline message for the R2015 study asserts, “The best argument for ligand efficiency is simply the assertion that on average smaller drugs have lower cost of goods, tend to be more soluble and bioavailable, and have fewer metabolic liabilities”. I do agree that excessive molecular size is undesirable (we applied the term 'pharmaceutical risk factor' to molecular size and lipophilicity in our critique) but one is still left with the problem of choosing a concentration unit or, at least justifying the arbitrary choice of 1 M.  As such, it is not accurate to claim that the undesirability of excessive molecular size is the best argument for ligand efficiency because the statement also implicitly claims justification for the arbitrary choice of 1 M as the concentration unit.  I have no problem normalizing potency with respect to molecular size but doing so properly requires analyzing the data appropriately rather than simply pulling a concentration unit out of a hat and noisily declaring it to be the Absolute Truth in the manner in which one might assert the age of the Earth to be 6000 years..

I agree 100% with the statement, "heavy atoms are the currency we spend to achieve high-affinity binding, and all else being equal it is better to spend fewer for a given potency".  When we progress from a fragment hit to an optimized clinical candidate we spend by adding molecular size and our purchase is an increase in potency that is conveniently measured by the log of the ratio of IC50 values of the hit and optimized compound. If you divide this quantity by the difference in heavy atoms, you'll get a good measure of how good the deal was. Crucially, your perception of the deal does not depend on the concentration unit in which you express the  IC50 values and this is how group efficiency works. It's a different story if you use LE and I'll direct you back to the figure above so that you can convince yourself of this.

I noted earlier that the R2015 study fails to state what criticisms have been made of LE and simply dismisses them in generic terms as fuss and noise.  I don't know whether this reflects a failure to understand the criticisms or a 'head in the sand' reaction to a reality that is just too awful to contemplate.  Either way, the failure to address specific criticisms made of LE represents a serious deficiency in the R2015 study and it will leave some readers with the impression that the criticisms are justified even if they have not actually read our critique. The R2015 study asserts, "There is no need to become overly concerned with noisy arguments for or against ligand efficiency metrics being exchanged in the literature" and, given the analogy drawn in R2015 between LE and fuel economy, I couldn't help thinking of a second hand car dealer trying allay a fussy customer's concerns about a knocking noise coming from the engine compartment of a potential purchase.

So now it's time to wrap up and I hope that you have learned something from the post even if you have found it to be tedious (I do have to admit that metrics bore me shitless).  Why all the fuss?  I'll start by saying that drug discovery is really difficult and making decisions using metrics that alter your perception when different units are used is unlikely to make drug discovery less difficult. If we do bad analysis and use inappropriate metrics then those who fund drug discovery may conclude that the difficulties we face are of our own making.  Those who advocate the use of LE are rarely (if ever) open about the fact that LE metrics are almost invariably tied to a 1 M reference concentration and that changing this reference alters our perception of efficiency. We need to remember that the purpose of a metric is to measure and not to tell us what we want to hear. I don't deny that LE is widely used in drug discovery and that is precisely why we need to make a fuss about its flaws. Furthermore, there are ways to normalize activity with respect to molecular size that do not suffer from LE's deficiencies.  I'll even admit that LE is useful since the metric has proven to be a powerful propaganda tool for FBDD. So useful, in fact, that the R2015 study uses the success of FBDD as propaganda for LE. Is the tail wagging the dog or have the inmates finally taken over the asylum?




2 comments:

Lewis Vidler said...

This is a nice post. I have to admit I used to be a firm believer in LE having used it in publications etc. Having spent far more time thinking about the problem I now am less of a fan and believe that each fragment/molecule needs to be considered carefully on it's own merit. If 2 fragments occupy different parts of the binding site is it fair to consider any kind of size weighted metric? From my perspective each of these may be able to tell you something about how to find a potent and selective molecule for that target and discarding one early on in the process will be detrimental to subsequent progress.

Peter Kenny said...

Hi Lewis, thanks for your comment. Your point about treating each fragment hit on its merits is very much on target and the early stages of a fragment-based lead discovery project are likely to involve assaying analogs of hits identified in the primary screen. If you’re lucky, it may be possible to discern differences in response to an increase in molecular size for different fragments from this exercise (a steeper response to molecular size or lipophilicity for one fragment may trump the higher affinity of another). Screening can be thought of as an exercise in assembling SAR and we should attempt to do this as efficiently (e.g. information gained/number of compounds assayed) as possible. A lower affinity fragment may provide more information than one with a higher affinity and your example of fragments binding at different locations is particularly appropriate. As an aside, fragment linking/merging is an area where standard states need to be considered. There may still be value in trying to establish a generic affinity threshold for fragment hits as a function of molecular size but, if so, we should ask why the relevant reference line be forced through a point corresponding to 1 M affinity and zero molecular size.

Where something that does what LE is claimed to do is more likely to have value is in a project where there is a greater range in molecular size for compounds. This is precisely where LE’s ‘units problem’ becomes a big deal. In our critique of LEMs, we argued that one should plot (and model) affinity (or potency) as a function of molecular size and use the observed magnitude of the response as an indication of how well we are doing. The residuals quantify the extent to which the activity of individual compounds beat the trend observed for all compounds and can be thought of as representing a type of ligand efficiency. Analyzing affinity (or potency) data in this manner effectively normalizes it with respect to the trend actually observed in the data as opposed to a trend that somebody thinks is in the data.