Molecular Design: Yet more on ligand efficiency metrics

In this post, I'll be responding to a couple of articles in the literature that cited our gentle critique of ligand efficiency metrics (LEMs). The critique has also been distilled into a harangue and readers may find that a bit more digestible that the article. As we all know, Ligand Efficiency (LE), the original LEM was introduced to normalize affinity with respect to molecular size. Before getting started, I'd like to ask you, the reader, to ask yourself exactly what you take to mean by the term 'normalize'.

The first article which I'll call L2016 states:

Optimisation frequently increases molecular size, and on average there is a trade-off between potency and size gains, leading to little or no gain in LE [42,52] but an increase in SILE [52]. This, and the nonlinear dependence of LE on heavy atom count, together with thermodynamic considerations, has led some authors to question the validity of LE [76,77], while others support its use [52,78,79].

This statement is misleading because the "thermodynamic considerations" are that our perception of efficiency changes when we change the concentration units in which affinity and potency are expressed. As such, LE is a physicochemically meaningless quantity and, in any case, references 52 and 78 precede our challenge to the thermodynamic validity of LE (although not an equivalent challenge in 2009). Reference 78 uses a mathematically invalid formula for LE when claiming to have shown that LE is mathematically valid and reference 79 creates much noise while evading the challenge. I have responded to reference 79 (aka the 'sound and fury article') in two blog posts ( 1 | 2 ).

This is a good place for a graphic to break up the text a bit and I'll use the table (pulled from an earlier post) that shows how our perception of ligand efficiency changes with the concentration units used to define affinity. I've used base 10 logarithms and dispensed with energy units (which are often discarded) to redefine LE as generalized LE (GLE) so that we can explore the effect of changing the concentration unit (which I've called a reference concentration). Please take special of note how a change in concentration unit can change your perception of efficiency for the three compounds. Do you think it makes sense to try to 'correct' LE for the effects of molecular size?

Another article also cites our LEM critique. Let's take a look at how the study, which I'll call M2016, responds to our criticism of LE (reference 69 in this study):

The appeal of LE and GE is in the convenience and rapidity with which these factors can be assessed during lead optimization, but the simplistic nature of these metrics requires an understanding of, and appreciation for, their inherent limitations when interpreting data.[67,68,69,70] The relevance of LE as a metric has been challenged based on the lack of direct proportionality to molecular size and an inconsistency of the magnitude of effect between homologous series, both attributed to a fundamental invalidity underlying its mathematical derivation.[65,67] These criticisms have stimulated considerable discussion and provoked discourse that attempts to moderate the perspective and provide guidance on how to use LE and GE as rule-of-thumb metrics in lead optimization.[68,69,70]

To be blunt, I don't think that the M2016 study does actually respond to our criticism of LE as a metric which is that our perception of efficiency changes when we change the concentration unit with which we specify affinity or potency. This is an alarming characteristic for something that is presented as a tool for decision making and, if it were a navigational instrument, we'd be talking about fundamental design flaws rather than "limitations". The choice of 1 M is entirely arbitrary and selecting a particular concentration unit for calculation of LE places the burden of proof on those making the selection to demonstrate that this particular concentration unit is indeed the one that is most fit for purpose.

The other class of LEM that is commonly encountered is exemplified by what is probably best termed lipophilic efficiency (LipE). Although the term LLE is more often used, there appears to be some confusion as to whether this should be taken to mean ligand-lipophilicity efficiency or lipophilic ligand efficiency so it's probably safest to use LipE. Let's see what the M2016 study has to say about LipE:

LLE is an offsetting metric that reflects the difference in the affinity of a drug for its target versus water compared to the distribution of the drug between octanol and water, which is a measure of nonspecific lipophilic association.[69,12]

If I knew very little about LEMs, I would find this sentence a bit confusing although I think that it is essentially correct. We used (and possibly even introduced) the term 'offset' in the LEM context to describe metrics that are defined by subtracting risk factor from affinity (or potency). This is in contrast to LE and its variations which are defined by dividing affinity (or potency) by molecular size and can be described as scaled. There is still an arbitrary aspect to LipE in that we could ask whether (pIC50 - 0.5 ´ logP) might not be a better metric than (pIC50 - logP). Unlike LE, however, LipE is a quantity that actually has some physicochemical meaning, provided that the compound in question binds to its target in an uncharged form. Specifically, LipE can be considered to quantify the ease (or difficulty) of moving the compound from octanol to its binding site in the target as shown in the figure below:

Let's see what M2016 study has to say:

However, care needs to be exercised in applying this metric since it is dependent on the ionization state of a molecule, and either Log P or Log D should be used when appropriate.

This statement fails to acknowledge a third option which is that there may be situations in which neither logP nor logD is appropriate for defining LipE. One such situation is when the compound binds to its target in a charged form. When this is the case, neither logP nor logD quantifies the ease (or difficulty) of moving the bound form of compound from octanol to water. As an aside, using logD to quantify compound quality suggests that increasing the extent of ionization will lead to better compounds and I hope that readers will see that this is a strategy that is likely to end in tears.

Let's take a look at LEMs from the perspective of folk who are working in lead optimization projects or doing hit-to-lead work. Merely questioning the value of LEMs is likely incur the wrath of Mothers Against Molecular Obesity (MAMO) so I'll stress that I'm not denying that excessive lipophilicity and molecular size are undesirable. We even called them "risk factors" in our LEM critique. That said, in the compound quality and drug-likeness literature, it is much more common to read that X and Y are correlated, associated or linked than to actually be shown how strong the correlation, association or linkage is. When you do get shown the relationship between X and Y, it's usually all smoke and mirrors (e.g. graphics colored in lurid, traffic light hues). When reading M2016 you might be asking why can't we see the relationship between PFI and aqueous solubility presented more directly (or even why iPFI is preferred over PFI for hERG and promiscuity). A plot of one against the other perhaps even a correlation coefficient? Is it really too much to ask?

The reason for the smoke and mirrors is that the correlations are probably weak. Does this mean that we don't need to worry about risk factors like molecular size and lipophilicity? No, it most definitely does not! "You speak in more riddles than a Lean Six Sigma belt", I hear you say, "and you tell us that the correlations with the risk factors have been smoked and mirrored and yet we still need to worry about the risk factors". Patience, dear reader, because the apparent paradox can be resolved once you realize some much stronger local correlations may be lurking beneath an anemic global correlation. What this means is that potencies of compounds in different projects (and different chemical series in the same project) may respond differently to risk factors like lipophilicity and molecular size. You need to start thinking of each LO project as special (although 'unique' might be a better term because 'special projects' were what used to happen to senior managers at ICI before they were put out to pasture).

Another view of LEMs is that they represent reference lines. For example, we can plot potency against molecular size and draw a line with positive slope from a point corresponding to a 1 M IC50 on the potency axis and say that all points on the line correspond to the same LE. Analogously, we can draw a line of unit slope on a plot of pIC50 against logP and say that all points on the line correspond to the same LipE. You might be thinking that these reference lines are a bit arbitrary and you'd be thinking along the right lines. The intercept on the potency axis is entirely arbitrary and that was the basis of our criticism of LE. A stronger case can be made for considering a line of unit slope on a plot of pIC50 against logP to represent constant LipE but only if the compounds bind in uncharged forms to their target.

Let's get back to that project you're working on and let's suppose that you want to manage risk factors like lipophilicity and molecular size. Before you calculate all those LEMs for your project compounds, I'd like you to plot pIC50 against molecular size (it actually doesn't matter too much what measure of molecular size you use). What you now have in front of you is the response of potency to molecular size. Do you see any correlation between pIC50 and molecular size? Why not try fitting a straight line to your data to get an idea of the strength of the correlation? The points that lie above the line of fit beat the trend in the data and the points that lie below the line are beaten by the trend. The residual for a point is simply the distance above the line for that point and its value tells you how much the activity that it represents beats the trend in the data. Are there structural features that might explain why some points are relatively distant from the line that you've fit? In case you hadn't realized it, you've just normalized your data. Vorsprung durch technik! Here's a graphic to give you an idea how this might work.

The relationship between affinity and molecular size shown in the plot above is likely to be a lot tighter than what you'll see for a typical project. In the early stages of a project, the range in activity for the project compounds will often be too narrow for the response of activity to risk factor to be discerned. You can make assumptions about the response of affinity (or potency) to risk factor (e.g. that LipE will remain constant during optimization) in order to forecast outcome but it's really important to continually monitor the response of activity to risk factor to check that your assumptions still hold. If affinity (or potency) is strongly correlated with risk factor then you want the response to risk factor to be as steep as possible. Could this be something to think about when trying to prioritize between series?

So it's been a long post and there are only so many metrics that one can take in a day. If you want to base your decisions on metrics that cause your perception to change with units then as consenting adults you are free to do so (just as you are free to use astrological charts or to seek the face of a deity in clouds). A former prime minister of India drank a glass of his own urine every day and lived to 98. Who would have predicted that? Our LEM critique was entitled 'Ligand efficiency metrics considered harmful' and I now I need to say why. When doing property-based design, it is vital to get as full an understanding as possible of the response of affinity (or potency) to each of the properties in which you're interested. If exploring the relationship between X and Y, it is generally best to analyse the data as directly as possible and to keep X and Y separate (as opposed to looking at the response of a function of Y and X to X). When you use LEMs you're also making assumptions about the response of Y to X and you need to ask yourself whether that's a sensible way to explore the response of Y to X. If you want to normalize potency by risk factor, would you prefer to use the trend that you've actually observed in your data or an arbitrary trend that 'experts' recommend on the basis that it's "simple"?

Next week, PAINS...

Molecular Design

Friday, 3 June 2016

Yet more on ligand efficiency metrics

No comments: