Molecular Design: The boys who cried wolf

So it's back to blogging and it's taken a bit longer to get into it this year since I had to finish a few things before leaving Brazil. This is a long post so make sure to have some strong coffee to hand.

This post features an article, 'Molecular Property Design: Does Everyone Get It?' by two unwitting 'collaborators' in our correlation inflation Perspective. There are, however, a number of things that the authors of this piece just don't 'get' which makes their choice of title particularly unfortunate. The first thing that they don't 'get' is that doing questionable data analysis in the past means that people in the present are less likely to heed your warnings about the decline in quality of compounds in today's pipelines. As has been pointed out more than once by this blog, rules/guidelines in drug discovery are typically based on trends observed in measured data and the strength of the trend tells you how rigidly you should adhere to the rule/guideline. Correlation inflation (see also voodoo correlations) is a serious problem in drug discovery because it causes drug discovery scientists to to give more weight to rules/guidelines (and 'expert' opinion) than is justified by the data. In drug discovery, we need to make a distinction between what we believe and what we know. If we can't (or won't) make this distinction then those who fund our activities may conclude that the difficulties that we face are actually of our own making and that's something else that the authors of the featured article just don't seem to 'get'. "Views obtained from senior medicinal chemistry leaders..." does come across as arm-waving and I'm surprised that the editor and reviewers (if there were any) let them get away with it.

If you're familiar with the correlation inflation problem, you'll know that one of the authors of the featured article did some averaging of groups of data points prior to analysis which was presented in support of an assertion that, "Lipophilicity plays a dominant role in promoting binding to unwanted drug targets". This may indeed be the case but it is not correct to suggest that the analysis supports this opinion because the reported correlations are between promiscuity and median lipophilicity rather than lipophilicity itself. The author concedes that the analysis has been criticized but does not make any attempt to rebut the criticism. Readers can draw their own conclusions from the lack of rebuttal.

The other author of the featured article also 'contributed' to our correlation inflation study although it would be stretching it to term that contribution as 'data analysis'. The approach used there was to first bin the data and then to plot bar charts which were compared visually. You might wonder how a bar chart of binned data can be used to quantify the strength of a trend and, if attempting to do this, keep your arms loose because you'll be waving them a lot. Here are a couple of examples of how the approach is applied:

The clearer stepped differentiation within the bands is apparent when log D_pH7.4rather than log P is used, which reflects the considerable contribution of ionization to solubility.

This graded bar graph (Figure 9) can be compared with that shown in Figure 6b to show an increase in resolution when considering binned SFI versus binned c log D_pH7.4 alone.

This second approach to data 'analysis' is actually more relevant than the first to this blog post because it is used as 'support' ('a crutch' might be a more appropriate term) for SFI (Solubility Forecast Index), which is the old name for PFI (Property Forecast Index) which the featured article touts as a metric. If you're thinking that it's rather strange to 'convert' one form of continuous data (e.g. measured logD) into another form of continuous data (values of metrics) by first making it categorical and turning it into pictures, you might not be alone. What 'senior medicinal chemistry leaders' would make of such data 'analysis' is open to speculation.

But enough of voodoo correlations and 'pictorial' data analysis because I should make some general comments on property-based design. Here's a figure that provides an admittedly abstract view of property-based design.

One challenge for drug-likeness advocates analyzing large, structurally heterogenous data sets is to make the results of analysis relevant to the medicinal chemists working on one or two series in a specific lead optimization project. Affinity (for association with both therapeutic target and antitargets) and free concentration at site of action are the key determinants of drug action. In general, the response of activity to lipophilicity depends on chemotype and, in the case of affinity, also on the relevant protein target (or antitarget). If you're going to tell medicinal chemists how to do their jobs then you can't really afford to have any data-analytic skeletons rattling around in the closet and that's something else that the authors of the featured article just don't 'get'.

The featured article asserts:

The principle of minimal hydrophobicity, proposed by Hansch and colleagues in 1987 states that “without convincing evidence to the contrary, drugs should be made as hydrophilic as possible without loss of efficacy.” This hypothesis is surviving the test of time and has been quantified as lipophilic ligand efficiency (LLE or LipE).

A couple of points need to be made here. Firstly, when Hansch et al refer to 'hydrophobicity', they mean octanol/water logP (as opposed to logD). Secondly, the observation that excessive lipophilicity is a bad thing doesn't actually justify using LLE/LipE in lead optimization. The principle proposed by Hansch et al suggests that a metric of the following functional form may be useful for normalization of activity with respect to liophilicity:

pIC₅₀ - (l ´ logP)

However, the principle does not tell us what value of l is most appropriate (or indeed whether a single value of l is appropriate for all situations). The 'sound and fury' article reviewed in an earlier post makes a similar error with ligand efficiency.

So it's now time to take a look at PFI and the featured article asserts:

The likelihood of meeting multiple criteria, a typical requirement for a candidate drug, increases substantially with ‘low fat, low flat’ molecules where PFI is <7, versus >7. In considering a portfolio of drug candidates, the probabilistic argument hypothesizes that successful outcomes will increase as the portfolio’s balance of biological and physicochemical properties becomes more similar to that of marketed drugs.

The first thing that a potential user of PFI should be asking him/herself is where this magic value of 7 comes from since the featured article does imply that the likelihood of good things will increase substantially when PFI is reduced from 7.1 to 6.9. Potential users also need to ask whether this step jump in likelihood is backed by statistical analysis of experimental data or by 'clearer stepped variation' in pictures created using an arbitrary binning scheme. It's also worth remembering that thresholds used to apply guidelines often reflect the binning schemes used to convert continuous data to categorical data and the correlation inflation Perspective discusses the 4/400 rule in this context. Something that molecular property design 'experts' really do need to 'get' is that simple yes/no guidelines are of limited use in practical lead optimization even when these are backed by competent analysis of relevant experimental data. Molecular property 'experts' also need to 'get' that measured lipophilicity is not actually a molecular property.

PFI is defined as the sum of chromatographic logD (at pH 7.4) and the number of aromatic rings:

PFI = Chrom logD_pH7.4 + # Ar rings

Now suppose that you're a medicinal chemist in a department where the head of medicinal chemistry has decreed that that 80% of compounds synthesized by departmental personnel must have PFI less than 7. When senior medicinal chemistry leaders set targets like these, the primary objective (i.e topic of your annual review) is to meet them. Delivering clinical candidates is secondary objective since these will surely materialize in the pipeline as if by magic provided that the compound quality targets are met.

There is a difference between logD (whether measured by shake-flask or chromatographically) and logP and one which it is important for compound quality advocates to 'get'. When we measure lipophilicity, we determine logD rather than logP and so it is not generally valid to invoke Hansch's principle of minimal hydrophobicity (which is based on logP) when using logD. If the compound in question is not significantly ionized under experimental conditions (pH) then logP and logD will be identical. However, this is not the case when ionization is significant as is usually the case for amines and carboxylic acids at a physiological pH like 7.4. If ionization is significant then logD will typically be lower than logP and we sometimes assume that only the neutral form of the compound partitions into the organic phase for the purposes of prediction or interpretation of log D values. If this is indeed the case we can write logD as a function of logP and the fraction of compound existing in neutral form(s):

log D(pH) = log P + log F_neut(pH)

Ionized forms can sometimes partition into the organic phase although measuring the extent to which this happens is not easy and the effective partition coefficient for a charged entity depends on whatever counter ion is present (and its concentration).

So let's get back to the problem of reducing logD so our medicinal chemist can achieve those targets and get an A+ rating in the annual review. Two easy ways to lower logD are to add ionizable groups (if compound is neutral) and to increase extent of ionization (if compound already has ionizable groups). Increasing the extent of ionization will generally be expected to increase aqueous solubility but I hope readers can see why we wouldn't expect this to help when a compound binds in an ionized form to an antitarget such as hERG (see here for a more complete discussion of this point). Now I'd like you to take a close look at Figure 2(a) in the featured article. You'll notice that the profiles for the last two entries (hERG and promiscuity) have actually been generated using intrinsic PFI (iPFI) rather than PFI itself and you may be wondering what iPFI is and why it was used instead of PFI. In answer to the first question, iPFI is calculated using logP rather than logD:

iPFI = logP + # Ar rings

This definition of iPFI is not quite complete because the authors of the featured article don't actually say what they mean by logP. Is it actually obtained directly from experimental measurements (e..g. logD/pH profile) or is it calculated (in which case it should be stated which method was used for the calculation).

Some medicinal chemists reading this will be asking what iPFI was even doing in the article in the first place and my response would be, as I say frequently in Brazil, 'boa pergunta'. My guess is that using PFI rather than iPFI for the hERG row of Figure 2(a) would have the effect of shifting the cells in this row one or two cells to the left (based on the assumption that logP will be 1 to 2 units greater than logD at pH 7.4). Such a shift would make compounds with PFI less than 7 look 'dirtier' than the PFI advocates would like you to think.

There is another term in PFI and that's the number of aromatic rings (# Ar rings) which is meant to measure how 'flat' a molecular structure is. That it might do but then again it might not because two 'flat' aromatic rings will look a lot less flat when linked by a sulfonyl group and their rigidity could prove to be a liability when trying to pack them into a crystal lattice. However, number of aromatic rings will also quantify molecular size (especially in typical Pharma compound collections) and this is something my friends at Practical Fragments have also noted. Molecular size had been recognized as a pharmaceutical risk factor for at least a decade before people started to tout PFI (or SFI) as a compound quality metric and we can legitimately ask whether or not using a more conventional measure of molecular size (e.g. molecular weight, number of non-hydrogen atoms or molecular volume) would have resulted in a more predictive (or useful) metric.

So let's assume for a moment that you're a medicinal chemist in a place where the 'senior medicinal chemistry leaders' actually believe that optimizing PFI is useful. In case you don't know, jobs for medicinal chemists don't exactly grow on trees these days and so it makes a lot of sense to adopt an appropriately genuflectory attitude to the prevailing 'wisdom' of your 'leaders'. The problem is that your 1 nM enzyme inhibitor with the encouraging pharmacokinetic profile has a PFI of 8 and your lily-livered manager is taking some flak from the Compound Respository Advisory Panel for having permitted you to make it in the first place. Fear not because, you have two benzene rings at the periphery of the molecular structure which will make the synthesis relatively easy. Basically you need to think of a metric like PFI as a Gordian knot that needs to be cut efficiently and you can do this either by eliminating rings or by eliminating aromaticity. Substitution of benzoquinone (either isomer) or cyclopentadiene for the offending benzene rings will have the desired effect.

It's been a long post and I really do need to start wrapping things up. One common reaction when you criticize of drug discovery metrics is the straw man defense in which your criticism is interpreted as an assertion that one doesn't need to worry about physicochemical properties. In other words, this is precisely the sort of deviant behavior that MAMO (Mothers Against Molecular Obesity) have been trying to warn about. To the straw men, I will say that we described lipophilicity and molecular size as pharmaceutical risk factors in our critique of ligand efficiency metrics. In that critique, we also explain what it means to normalize activity to with respect to risk factor and that's something that not even the NRDD ligand efficiency metric review does. There's a bit more to defining a compound quality metric than dreaming up arbitrary functions of molecular size and lipophilicity and that's something else that the authors of the featured article just don't seem to 'get'. When you use PFI you're assuming that a one unit decrease in chromatographic logD is equivalent to eliminating an aromatic ring (or the aromaticity of a ring) from the molecular structure.

The essence of my criticism of metrics is that the assumptions encoded by the metrics are rarely (if ever) justified by analysis of relevant measured data. A plot of pIC₅₀ against the relevant property for your project compounds is a good starting point for property-based design and it allows you to use the actual trend observed in your data for normalization of activity values (see the conclusion to our ligand efficiency metric critique for a more detailed discussion of this). If you want to base your decisions on 'clearer stepped differentiation' in pictures or on the blessing of 'senior medicinal chemistry leaders', as a consenting adult, you are free to do so.

Molecular Design

Monday, 29 February 2016

The boys who cried wolf

No comments: