Thursday, 19 February 2015

There's more to molecular design than making predictions

So it arrived as a call to beware of von Neumann’s elephants and the session has already been most ably covered by The CuriousWavefunction (although I can’t resist suggesting that Ash's elephant may look more like a cyclic voltammogram than the pachyderm of the title).  The session to which I refer was Derek Lowe’s much anticipated presentation to a ravening horde of computational chemists that was organized by Boston Area Group Modeling and Informatics (BAGIM).  Derek Pipelined beforehand and I provided some helpful tips (comment 1) on how to keep the pack at bay. I have to admit that I meant to say that that many continuum solvent models are charge-type symmetric (as opposed to asymmetric) and should point out that it is actually reality (e.g. DMSO) that is asymmetric with respect to charge type.  At least no damage appears to have been done.  As an aside, I was gratified by how enthusiastically the bait was taken (see comments 8 and 9) but that’s getting away from the aim of this post which is to explore the relationship between medicinal chemist and computational chemist.

Many in the drug discovery business seem to be of the opinion that the primary function of the pharmaceutical computational chemist is predict activity (enzyme inhibition; hERG blockade) and properties (e.g. solubility; plasma protein binding) of compounds before they are synthesized.  I agree that the goal of both computational chemists and medicinal chemists is identification of efficacious and safe compounds and accurate prediction would be of great value in enabling the objective to be achieved efficiently.  However, useful prediction of the (in vivo) effects of an arbitrary compound directly from molecular structure is not something that is currently feasible nor does it look like it will become feasible for a long time.  Tension is likely to develop between the computational chemist and the medicinal chemist when either or both believe that the primary function of the former is simply to build or use predictive (e.g. QSAR) models for activity and ADME behavior.

One way of looking at Drug Discovery is as a process of accumulating knowledge and perhaps the success of the drug discovery project team should be measured by how efficiently they acquire that knowledge.  A project team that quickly and unequivocally invalidates a target has made a valuable contribution because the project can be put out of its misery and the team members can move onto projects with greater potential.  Getting smarter (and more honest) about shutting down projects is one area in which the pharma/biotech industry can improve.  Although there are a number of ways (e.g. virtual/directed screening; analysis of screening output) that the computational chemist can contribute during hit identification, I’d like to focus on how computational and medicinal chemists can work together once starting points for synthesis have been identified (i.e. during hit-to-lead and lead optimization phases of project).

In drug discovery projects we typically we accumulate knowledge by posing hypotheses (3-chloro will increase potency or a 2-methoxy will lock the molecule into its bound conformation) and that’s why I use the term hypothesis-driven molecular design (HDMD) that I wrote about in an earlier post.  When protein structural information is available, the hypothesis often takes the form that making a new interaction will lead to an increase in affinity and the key is finding ways of forming interactions with optimal binding geometry without incurring conformational/steric energy penalties.  The computational chemist is better placed than the medicinal chemist to assess interactions by these criteria while the medicinal chemist is better placed to assess the synthetic implications of forming the new interactions.  However, either or both may have generated the ideas that the computational chemist assessed and many medicinal chemists have strong grasp of molecular interactions and conformational preferences even if they are unfamiliar with molecular modelling software used to assess these.  I always encourage medicinal chemists to learn to use the Cambridge Structural Database (CSD) and because it provides an easy way to become familiar with 3D structures and conformations of molecules as well as providing insight into the interactions that molecules make with one another.  It also uses experimental data so you don’t need to worry about force fields and electrostatic models. Here’s a post in which I used the CSD to gain some chemical insight that will give you a better idea of what I’m getting at.

One question posed in the Curious Wavefunction post was whether medicinal chemists should make dead compounds to test a model.   My answer to the question is that compounds should be prioritized for synthesis on the basis of the how informative they are likely to be or how active they are likely to be and, as pointed out by Curious Wavefunction, synthetic accessibility always needs to be accounted for. In hit-to-lead or early lead optimization, I’d certainly consider synthetic targets that were likely to be less active than the compounds that had already been synthesized but these would need have potential to provide information.  You might ask how should we assess the potential of a compound to provide information and my response would be that it is not, in general, easy but this is what hypothesis-driven molecular design is all about.  The further you get into lead optimization, the less likely it becomes that an inactive compound will be informative.

I realize that talking about hypothesis-driven molecular design and potential of synthetic targets to provide information may seem a bit abstract so I’ll finish this post with something a bit more practical.  This goes back over a decade to a glycogen phosphorylase inhibitor project for diabetes (see articles 1 | 2 | 3 | 4 | 5 | 6 ).  While lead discovery tends to be a relatively generic activity, the challenges in lead optimization tend to be more project-specific.  We were optimizing some carboxylic acids (exemplified by the molecular structure in the figure below) and were interested in reducing the fraction bound to plasma protein which is often an issue for compounds that are predominantly anionic under physiological conditions. 


I should point out that it wasn’t clear that reducing the bound fraction would have increased the free concentration (this point is discussed in more depth here) but hypothesis-driven design is more about asking good questions than making predictions. We wanted to reduce the unbound fraction (Fu) but we also wanted to keep the compounds anionic which suggested replacing the carboxylic acid with a bioisostere (see 1 | 2 | 3 | 4 | 5 ) such as tetrazole.  If you happen to be especially interested in the basis for the bioisosteric relationship between these molecular recognition elements, have a look at Fig 7 in this article but I’ll be focusing on finding out what effect the bioisosteric replacement will have on Fu.  Tetrazoles are less lipophilic (logP values are 0.3 to 0.4 units lower) than the corresponding carboxylic acids so we might expect that replacing a carboxylic acid with tetrazole will result in an increase  in Fu (this thinking is overly simplistic although saying why would take the blog post off on a tangent but I'm happy to pick this up in comments).  We did what has become known as matched molecular pair analysis (MMPA; see  1 | 2 | 3 | 4 | 5 | 6 ) and searched the in house plasma protein binding (PPB) database for pairs of compounds in which carboxylic acid was replaced by tetrazole while keeping the remainder of the molecular structure constant.  We do MMPA because we believe that it is easier to predict differences in the values of properties or activity than it is to predict the values themselves directly from molecular structure.  Computational chemists may recognize MMPA as a data analytic equivalent of free energy perturbation (FEP; see 1 | 2 ).

The original MMPA performed in 2003 suggested that tetrazoles would be more highly protein bound (i.e. lower Fu) than the corresponding carboxylic acids and on that basis we decided not to synthesize tetrazoles. Subsequent analysis of data for a larger number of matched molecular pairs that were available in 2008 arrived at the same conclusion with a higher degree of confidence (look at SE values) but it was the 2003 analysis that was used make the project decisions.  The standard deviation (SD) values of 0.23 and 0.20 are also informative because these suggest that the PPB assay is very reproducible even though it was not run in replicate (have a look at this article to see this point discussed in more depth).

You might ask what we would have done if we hadn’t been able to find any matched molecular pairs with which to test our hypothesis.  Had we decided to synthesize a tetrazole then it would have been sensible to select a reference carboxylic acid for which Fu was close to the center of the quantifiable range so as to maximize the probability of the matching tetrazole being ‘in-range’.

This is probably a good place to leave things.  Even if you don't agree with what I've said here,  I hope this blog post has at least got you thinking that there may be more to pharmaceutical computational chemistry than just making predictions.

Sunday, 25 January 2015

Molecular complexity in KL (Jan 2014)

I was in Kuala Lumpur about this time last year and visited International Medical University where I delivered a harangue.  It was a very enjoyable day and the lunch was excellent (as is inevitable in Malaysia where it seems impossible to find even mediocre food).  We discussed molecular complexity at lunch and, since a picture says a thousand words, I put the place mat to good use.

Wednesday, 21 January 2015

It's a rhodanine... fetch the ducking stool


So I promised do a blog post on rhodanines

This isn’t really a post on rhodanines or even PAINS.  It’s actually a post on how we make decisions in drug discovery.  More specifically, the post is about how we use data analysis to inform decisions in drug discovery. It was prompted by a Practical Fragments post which I found to be a rather vapid rant that left me with the impression that a bandwagon had been leapt upon with little idea of whence it came or whither it was going.   I commented and suggested that it might be an idea to present some evidence in support of the opinions presented there and my bigger criticism is of the reluctance to provide that evidence.  Opinions are like currencies and to declare one’s opinion to be above question is to risk sending it the way of the Papiermark.

However, the purpose of this post is not to chastise my friends at Practical Fragments although I do hope that they will take it as constructive feedback that I found their post to fall short of the high standards that the drug discovery community has come to expect of PracticalFragments.   I’ll start by saying a bit about PAINS which is an acronym for Pan Assay INterference compoundS and it is probably fair to say that rhodanines are regarded as the prototypical PAINS class.  The hydrogen molecule of PAINS even?  It’s also worth stating that the observation of assay interference does not imply that a compound in question is actually interacting with a protein and I’ll point you towards a useful article on how to quantify assay interference (and even correct for it when it is not too severe).  A corollary of this is that we can’t infer promiscuity (as defined by interacting with many proteins) or reactivity (e.g. with thiols) simply from the observation of a high hit rate.  Before I get into the discussion, I’d like you to think about one question.  What evidence do you think would be sufficient for you to declare the results of a study to be invalid simply on the basis of a substructure being present in the molecular structure of compound(s) that featured in that study?

The term PAINS was introduced in a 2010 JMC article about which I have already blogged.  The article presents a number of substructural filters which are intended to identify compounds that are likely to cause problems when screened and these filters are based on analysis of the results from six high throughput screening (HTS) campaigns.  I believe that the filters are useful and of general interest to the medicinal chemistry community but I would be wary of invoking them when describing somebody’s work as crap or asserting that the literature was being polluted by the offending structures.   One reason for this is that the PAINS study is that it is not reproducible and this limits the scope for using it as a stick with which to beat those who have the temerity to use PAINS in their research.  My basis for asserting that the study is not reproducible is that chemical structures and assay results are not disclosed for the PAINS and neither are the targets for three of the assays used in the analysis.  There are also the questions of why the output from only six HTS campaigns was used in the analysis and how these six were chosen from the 40+ HTS campaigns that had been run.  Given that all six campaigns were directed at protein-protein interactions employing AlphaScreen technology, I would also question the use of the term ‘Pan’ in this context.  It’s also worth remembering that sampling bias is an issue even with large data sets.  For example, one (highly cited) study asserts that pharmacological promiscuity decreases with molecular weight while another (even more highly cited) study asserts that the opposite trend applies.

This is probably a good point for me to state that I’m certainly not saying that PAINS compounds are ‘nice’ in the context of screening (or in any other context).  I’ve not worked up HTS output for a few years now and I can’t say that I miss it.  Generally, I would be wary of any compound whose chemical structure suggested that it would be electrophilic or nucleophilic under assay conditions or that it would absorb strongly in the uv/visible region or have ‘accessible’ redox chemistry. My own experience with problem compounds was that they didn’t usually reveal their nasty sides by hitting large numbers of assays.  For example, the SAR for a series might be ‘flat’ or certain compounds might be observed to hit mechanistically related assays (e.g. cysteine protease and a tyrosine phosphatase).  When analyzing HTS results the problem is not so much deciding that a compound looks ‘funky’ but more in getting hard evidence that allows you to apply the molecular captive bolt with a clear conscience (as opposed to “I didn’t like that compound” or “it was an ugly brute so I put it out of its misery” or “it went off while I was cleaning it”).


This is a good point to talk about rhodanines in a bit more detail and introduce the concept of substructural context which may be unfamiliar to some readers and I'll direct you to the figure above.  Substructural context becomes particularly important if you’re extrapolating bad behavior observed for one or two compounds to all compounds in which a substructure is present. Have a look at the four structures in the figure and think about what they might be saying to you (if they could talk).  Structure 1 is rhodanine itself but a lot of rhodanine derivatives have an exocyclic double bond as is the case for structures 2 to 4.  The rhodanine ring is usually electron-withdrawing which means that a rhodanine with an exocyclic double bond can function as a Michael acceptor and nucleophilic species like thiols can add across the exocyclic double bond.  I pulled structure 3 from the PAINS article and it is also known as WEHI-76490 and I’ve taken the double bond stereochemistry to be as indicated in the article.  Structure 3 has a styryl substituent on the exocyclic double bond which means that it is a diene and has sigmatropic options that are not available to the other structures.  Structure 4, like rhodanine itself, lacks a substituent on the ring nitrogen and this is why I qualified ‘electron-withdrawing’ with ‘usually’ three sentences previously.   I managed to find a pKa of 5.6 for 4 and this means that we’d expect the compound to predominantly deprotonated at neutral pH (bear in mind that some assays are run at low pH).   Any ideas about how deprotonation of a rhodanine like 4 would affect its ability to function as a Michael acceptor?  As an aside, I would still worry about a rhodanine that was likely to deprotonate under assay conditions but that would be going off on a bit of a tangent.


Now is a good time to take a look at how some of the substructural context of rhodanines was captured in the PAINS paper and we need to go into the supplemental information to do this.  Please take a look a the table above.  I’ve reconstituted a couple of rows from the relevant table in the supplemental material that is provided with the PAINS article.  You’ll notice is that there are two rhodanine substructural definitions, only one of which has the exocyclic double bond that would allow it to function as a Michael acceptor.  The first substructure matches the rhodanine definitions for the 2006 BMS screening deck filters although the 2007 Abbott rules for compound reactivity to protein thiols allow the exocylic double bond to be to any atom.  Do you think that the 60 compounds, matching the first substructure, that fail to hit a single assay should be regarded as PAINS?  What about the 39 compounds that hit a single assay?   You’ll also notice that the enrichment (defined as the ratio of the number of compounds hitting two to six assays to the number of compounds hitting no assays) is actually greater for the substructure lacking the exocyclic double bond.   Do you think that it would appropriate to invoke the BMS filters or Abbott rules as additional evidence for bad behavior by compounds in the second class?  As an aside it is worth remembering that forming a covalent bond with a target is a perfectly valid way to modulate its activity although there are some other things that you need to be thinking about

I should point out that the PAINS filters do provide a richer characterization of substructure than what I have summarized here.   If doing HTS, I would certainly (especially if using AlphaScreen) take note if any hits were flagged up as PAINS  but I would not summarily dismiss somebody's work as crap simply on the basis that they were doing assays on compounds that incorporated a rhodanine scaffold.  If I was serious about critiquing a study, I’d look at some of the more specific substructural definitions for rhodanines and try to link these to individual structures in the study.   However, there are limits to how far you can go with this and, depending on the circumstances, there are number of ways that authors of a critiqued study might counter-attack.  If they’d not used AlphaScreen and were not studying protein-protein interactions, they could argue irrelevance on the grounds that the applicability domain of the PAINS analysis is restricted to AlphaScreen being used to study protein-protein interactions.  They could also get the gloves off and state that six screens, the targets for three of which were not disclosed, are not sufficient for this sort of analysis and that the chemical structures of the offending compounds were not provided.  If electing to attack on the grounds that this is the best form of defense, they might also point out that source(s) for the compounds were not disclosed and it is not clear how compounds were stored, how long they spent in DMSO prior to assay and exactly what structure/purity checks were made.

However, I created this defense scenario for a reason and that reason is not that I like rhodanines (I most certainly don’t).   Had it been done differently, the PAINS analysis could have been a much more effective (and heavier) stick with which to beat those who dare to transgress against molecular good taste and decency.   Two things needed to be done to achieve this.  Firstly, using results from a larger number of screens with different screening technologies would have gone a long way to countering applicability domain and sampling bias arguments.  Secondly, disclosing the chemical structures and assay results for the PAINS would make it a lot easier to critique compounds in literature studies since these could be linked by molecular similarity (or even direct match) to the actual ‘assay fingerprints’ without having to worry about the subtleties (and uncertainties) of substructural context. This is what Open Science is about.

So this is probably a good place to leave things.  Even if you don't agree with what I've said,  I hope that this blog post will have at least got you thinking about some things that you might not usually think about. Also have another think about that question I posed earlier. What evidence do you think would be sufficient for you to declare the results of a study to be invalid simply on the basis of a substructure being present in the molecular structure of compound(s) that featured in that study?


Sunday, 11 January 2015

New year, new blog name...

A new year and a new title for the blog which will now just be ‘Molecular Design’.  I have a number of reasons for dropping fragment-based drug discovery (FBDD) from the title but first want to say a bit about molecular design because that may make those reasons clearer.  Molecular design can be defined as control of the behavior of compounds and materials by manipulation of molecular properties.  The use of the word ‘behavior’ in the definition is very deliberate because we design a compound or material to actually do something like bind to the active site of an enzyme or conduct electricity or absorb light of a particular wavelength.   A few years ago, I noted that molecular design can be hypothesis-driven or prediction-driven. In making that observation, I was simply articulating something that many would have already been aware of rather than presenting radical new ideas.  However, I was also making the point that it is important to articulate our assumptions in molecular design and be brutally honest about what we don’t know. 

Hypothesis-driven molecular design (HDMD) can be thought of as a framework in which to establish what, in the interest of generality, I’ll call ‘structure-behavior relationships’ (SBRs) as efficiently as possible.  When we use HDMD, we acknowledge that is not generally possible to predict the behavior of compounds directly from molecular structure in the absence of measurements for structurally related compounds.  There is an analogy between HDMD and statistical molecular design (SMD) in that both can be seen as ways of obtaining the information required for making predictions even though the underlying philosophies may differ somewhat.  The key challenge for both HDMD (and SMD) is identifying the molecular properties that will have the greatest influence on the behavior of compounds and this is challenging because you need to do it without measured data.   An in depth understanding of molecular properties (e.g. conformations, ionization, tautomers, redox potential, metal complexation, uv/vis absorption) is important when doing HDMD because this enables you to pose informative hypotheses.  In essence, HDMD is about asking good questions with informative compounds and relevant measurements and the key challenge is how to make the approach more systematic and objective.   One key molecular property is something that I’ll call ‘interaction potential’ and this is important because the behavior of a compound is determined to a large extent by the interactions of its molecules with the environments (e.g. crystal lattice, buffered aqueous solution) in which they exist.

Since FBDD is being dropped from the blog title, I thought that I’d say a few words about where FBDD fits into the molecular design framework.   I see FBDD as essentially a smart way to do structure-based design in that ligands are assembled from proven molecular recognition elements. The ability to characterize weak binding allows some design hypotheses to be tested without having to synthesize new compounds.  It’s also worth remembering that the FBDD has its origins in computational chemistry ( MCSS | Ludi | HOOK ) and that an approach to crystallographic mapping of protein surfaces was published before the original SAR by NMR article made its appearance.  My own involvement with FBDD began in 1997 and I focused on screening library design right from the start.  The screening library design techniques described in blog posts here ( 1 | 2 | 3 | 4 | 5) and a related journal article have actually been around for almost 20 years although I think they still have some relevance to modern FBDD even if they are getting a bit dated.  If you’re interested, you can find a version of the SMARTS-based filtering filter software that I was using even before Zeneca became AstraZeneca.  It’s called SSProFilter and you can find source code (it was built with the OEChem toolkit) in the supplemental information for our recent article on alkane/water logP.  So why drop FBDD from the blog title? For me, molecular design has always been bigger than fragment-based molecular design and my involvement in FBDD projects has been minimal in recent years.  FBDD is increasingly becoming mainstream and, in any case, dropping FBDD from the blog title certainly doesn’t prevent me from discussing fragment-based topics or even indulging in some screening library design should the opportunities arise.

As some readers will be aware, I have occasionally criticized some of the ways that things get done in drug discovery and so it’s probably a good idea to say something about the directions in which I think pharmaceutical design needs to head.  I wrote a short Perspective for the JCAMD 25th anniversary issue three years ago and this still broadly represents my view where the field should be going.  Firstly we need to acknowledge the state of predictive medicinal chemistry and accept that we will continue to need some measured data for the foreseeable future.  This means that, right now, we need to think more about how collect the most informative data as efficiently as possible and less about predicting pharmacokinetic profiles directly from molecular structure.  Put another way, we need to think about drug discovery in a Design of Experiments framework.  Secondly, we need to look at activity and properties in terms of relationships between structures because it’s often easier to predict differences in the values of a property than it is to predict the values themselves.  Thirdly, we need to at least consider alternatives to octanol/water for partition coefficient measurement.

There are also directions in which I think we should not be going.  Drug discovery scientists will be aware of an ever-expanding body of rules, guidelines and metrics (RGMs) that prompts analogy with The Second Law.  Some of this can be traced to the success of the Rule of 5 and many have learned that is a lot easier to discover metrics than it is to discover drugs.  If you question a rule, the standard response will be, “it’s not a rule, it’ a guideline” and your counter-response should be, “you’re the one who called it a rule”. Should you challenge the quantitative basis of a metric, which by definition is supposed to measure something, it is likely that you will be told how useful it is.  This defense is easily outflanked by devising a slightly different metric and asking whether it would be more or less useful.   Another pattern that many drug discovery scientists will have recognized is that, in the RGM business, simplicity trumps relevance. 

Let’s talk about guidelines for a bit.  Drug discovery guidelines need to be based on observations of reality and that usually means trends in data.  If you’re using guidelines, then you need to know the strength of the trend(s) on which the guidelines are based because this tells you how rigidly you should adhere to the guidelines.  Conversely, if you’re recommending that people use the guidelines that you’re touting then it’s very naughty to make the relevant trends appear to be stronger than they actually are.   There are no winners when data gets cooked and I think this is a good point at which to conclude the post.

So thanks for reading and I’ll try whet your appetite by saying that the next blog post is going to be on PAINS. Happy new year!

Saturday, 13 September 2014

Saving Ligand Efficiency?

<< Previous |

I’ll be concluding the series of posts on ligand efficiency metrics (LEMs) here so it’s a good point at which to remind you that the open access for the article on which these posts are based is likely to stop on September 14 (tomorrow). At the risk of sounding like one of those tedious twitter bores who thinks that you’re actually interested in their page load statistics, download now to avoid disappointment later. In this post, I’ll also be saying something about the implications of the LEM critique for FBDD and I’ve not said anything specific about FBDD in this blog for a long time.

LEMs have become so ingrained in the FBDD orthodoxy that to criticize them could be seen as fundamentally ‘anti-fragment’ and even heretical.  I still certainly believe that fragment-based approaches represent an effective (and efficient although not in the LEM sense) way to do drug discovery. At the same time, I think that it will get more difficult, even risky, to attempt to tout fragment-based approaches primarily on a basis that fragment hits are typically more ligand-efficient than higher molecular weight starting points for synthesis.  I do hope that the critique will at least reassure drug discovery scientists that it really is OK to ask questions (even of publications with eye-wateringly large numbers of citations) and show that the ‘experts’ are not always right (sometimes they’re not even wrong).

My view is that the LEM framework gets used as a crutch in FBDD so perhaps this is a good time for us to cast that crutch aside for a moment and remind ourselves why we were using fragment-based approaches before LEMs arrived on the scene.  Fragment-based approaches allow us to probe chemical space efficiently and it can be helpful to think in terms of information gained per compound assayed or per unit of synthetic effort consumed.  Molecular interactions lie at the core of pharmaceutical molecular design and small, structurally-prototypical probes allow these to be explored quantitatively while minimizing the confounding effects of multiple protein-ligand contacts.  Using the language of screening library design, we can say that fragments cover chemical space more effectively than larger species and can even conjecture that fragments allow that chemical space to be sampled at a more controllable resolution.  Something I’d like you to think about is the idea of minimal steric footprint which was mentioned in the fragment context in both LEM (fragment linking) and Correlation Inflation (achieving axial substitution) critiques.  This is also a good point to remind readers that not all design in drug discovery is about prediction.  For example, hypothesis-driven molecular design and statistical molecular design can be seen as frameworks for establishing structure-activity relationships (SARs) as efficiently as possible.

Despite criticizing the use of LEMs, I believe that we do need to manage risk factors such as molecular size and lipophilicity when optimizing lead series. It’s just I don’t think that the currently used LEMs provide a generally valid framework for doing this.  We often draw straight lines on plots of activity against risk factors in order to manage the latter.  For example, we might hypothesize that 10 nM potency will be sufficient for in vivo efficacy and so we could draw the pIC50 = 8 line to identify the lowest molecular weight compounds above this line.   Alternatively we might try to construct line a line that represents the ‘leading edge’ (most potent compound for particular value of risk factor) of a plot of pIC50 against ClogP.  When we draw these lines, we often make the implicit assumption that every point on the line is in some way equivalent. For example we might conjecture that points on the line represent compounds of equal ‘quality’.  We do this when we use LEMs and assume that compounds above the line are better than those below it.

Let’s take a look at ligand efficiency (LE) in this context and I’m going to define LE in terms of pIC50 for the purposes of this discussion. We can think of LE in terms of a continuum of lines that intersect the activity axis at zero (where pIC50  = 1 M).  At this point, I should stress that if the activities of a selection of compounds just happen to lie on any one of these lines then I’d be happy to treat those compounds as equivalent.  Now let’s suppose that you’ve drawn a line that intersects the activity axis at zero.  Now imagine that I draw a line that intersects the activity axis at 3 zero (where IC50  = 1 mM) and, just to make things interesting, I’m going to make sure that my line has a different slope to your line.  There will be one point where we appear to agree (just like −40° on the Celsius and Fahrenheit temperature scales) but everywhere else we disagree. Who is right? Almost certainly neither of us is right (plenty of lines to choose from in that continuum) but in any case we simply don’t know because we’ve both made completely arbitrary decisions with respect to the points on the activity axis where we’ve chosen to anchor our respective lines.  Here’s a figure that will give you a bit more of an idea what I'm talking about.

 

However, there may just be a way out this sorry mess and the first thing that has to go is that arbitrary assumption that all IC50 values tend to 1 M in the limit of zero molecular size.  Arbitrary assumptions beget arbitrary decisions regardless of how many grinning LeanSixSigma Master Black Belts your organization employs.  One way out of the mire is link efficiency with the response (slope) and agree that points lying on any straight line (of finite non-zero slope) when activity is plotted against risk factor represent compounds of equal efficiency.  Right now this is only the case when that line just happens to intersect the activity axis at a point where IC50  = 1 M.  This is essentially what Mike Schultz was getting at in his series of (  1  | 2  |  3  ) critiques of LE even if he did get into a bit of a tangle by launching his blitzkrieg on a mathematical validity front. 
The next problem that we’ll need to deal with if we want to rescue LE is deciding where we make our lines intersect the activity axis.  When we calculate LE for a compound, we connect the point on the activity versus molecular size representing the compound to a point on the activity axis with a straight line.  Then we calculate the slope of this line to get LE but we still need to find an objective way to select the point on the activity axis if we are to save LE.  One way to do this is to use the available activity data to locate the point for you by fitting a straight line to the data although this won’t work if the correlation between risk factor and activity is very weak.  If you can’t use the data to locate the intercept then how confident do you feel about selecting an appropriate intercept yourself?  As an exercise, you might like to take a look at Figure 1 in this article (which was reviewed earlier this year at Practical Fragments) and ask if the data set would have allowed you to locate the intercept used in the analysis.

If you buy into the idea of using the data to locate the intercept then you’ll need to be thinking about whether it is valid to mix results from different assays.   It may be the same intercept is appropriate to all potency and affinity assays but the validity of this assumption needs to be tested by analyzing real data before you go basing decisions on it.  If you get essentially the same intercept when you fit activity to molecular size for results from a number of different assays then you can justify aggregating the data. However, it is important to remember (as is the case with any data analysis procedure) that the burden of proof is on the person aggregating the data to show that it is actually valid to do so. 

By now hopefully you’ve seen the connection between this approach to repairing LE and the idea of using the residuals to measure the extent to which the activity of a compound beats the trend in the data.  In each case, we start by fitting a straight line to the data without constraining either the slope or the intercept.  In one case we first subtract the value of this intercept from pIC50 before calculating LE from the difference in the normal manner and the resulting metric can be regarded as measuring the extent to which activity beats the trend in the data.  The residuals come directly from the process and there is no need to create new variables, such as the difference between from pIC50 and the intercept, prior to analysis. The residuals have sign which means that you can easily see whether or not the activity of compound beats the trend in the data.  With residuals there is no scaling of uncertainty in assay measurement by molecular size (as is the case with LE).  Finally, residuals can still be used if the trend in the data is non-linear (you just need to fit a curve instead of a straight line).  I have argued the case for using residuals to quantify extent to which activity beats the trend in the data but you can also generate a modified LEM from your fit of the activity to molecular size (or whatever property by which you think activity should be scaled by).

It’s been a longer-winded post than I’d intended to write and this is a good point at which to wrap up.  I could write the analogous post about LipE by substituting ‘slope’ for ‘intercept’ but I won’t because that would be tedious for all of us.  I have argued that if you honestly want to normalize activity by risk factor then you should be using trends actually observed in the data rather than making assumptions that self-appointed 'experts' tell you to make.   This means thinking (hopefully that's not too much to ask although sometimes I fear that it is) about what questions you'd like to ask and analyzing the data in a manner that is relevant to those questions. 

That said, I rest my case.

Thursday, 4 September 2014

Efficiency can also be lipophilic

<< Previous || Next >>
In the previous post, I questioned the validity of scaled ligand efficiency metrics (LEMs) such as LE.  However, LEMs can also be defined by subtracting the value of the risk factor from activity and this has been termed offsetting.  For example, you can subtract a measure of lipophilicity from pIC50 to give functions such as:

pIC50 – logP

pIC50 – ClogP

pIC50 – logD(pH)
As you will have gathered from the previous post, I am not a big fan of naming LEMs. The reason for this is that you often (usually?) can’t tell from the definition exactly what has been calculated and I think it would be a lot better if people were forced (journal editors, you can achieve something of lasting value here) to be explicit about the mathematical function(s) with which they normalize activity of compounds.  In some ways, the problems are actually worse when activity is normalized by lipophilicity because a number of different measures of lipophilicity can be used and because differences between lipophilicity measures are not always well understood (even by LEM ‘experts’).  Does LLE mean ‘ligand-lipophilicity efficiency’ or 'lipophilic ligand efficiency’?  When informed that a compound has an LLE of 4, how can I tell whether it has been calculated using logD (please specify pH), logP or predicted logP (please specify prediction method since there several to choose from)?

Have a look at this figure and observe how the three parallel lines respond the offsetting transformation (Y => Y-X).  The line with unit slope transforms to a line of zero slope (Y-X is independent of X) while the other two lines transform to lines of non-unit slope (Y/X is dependent on X).  This figure is analogous to the one in the previous post that showed how three parallel lines transformed under the scaling transformation. 
 
It’s going to be helpful to generalize Lipophilic Efficiency (LipE) so let’s do that first:
LipEgen = pIC50  - (l ´ ClogP)

Generalizing LipE in this manner shows us that LipE (l = 1) is actually quite arbitrary (in much the same way that a standard or reference state is arbitrary and one might ask whether l = 0.5 might not be a better LEM.  Note that a similar criticism was made of Solubility Forecast Index in the Correlation Inflation Perspective. One approach to validating an LEM would be to show that it actually predicted relevant behavior of compounds. In the case of LEMs based on lipophilicity, it would be necessary to show that the best predictions were observed for l = 1.  Although one can think of an LEMs as simple quantitative structure activity relationships (QSARs), LEMs are rarely, if ever, validated in a way that QSAR practitioners would regard as valid.  Can anybody find a sentence in the pharmaceutical literature containing the words ‘ligand’, ‘efficiency’ and ‘validated’?  Answers on a postcard...
Offset LEMs do differ from scaled LEMs and one might invoke a thermodynamic argument to justify the use of LipE as an LEM.  In a nutshell it can be argued that LipE is a measure of the ease of moving a compound from a non-polar environment to its binding site in the protein.  There are two flaws in this argument which were discussed in our LEM critique which will be open access for another 10 days or so.  Firstly, when ligand binds in an ionized form, lipophilicity measures do not quantify the ease of moving the bound form from octanol to water because ionized forms of compounds do not usually partition into octanol to a significant extent. Secondly, octanol/water is just one of a number of partitioning systems and one needs to demonstrate that lipophilicity derived from it is optimal for definition of an LEM.  The figure below shows how logP can differ when measured in alternative partitioning systems and you should be aware of an occasionally expressed misconception that the relevant logP values simply differ by a constant amount.



One solution to the problem is to model pIC50 as a function of your favored measure of lipophilicity and use the residuals to quantify the extent to which activity beats the trend in the data.  This is what exactly what I suggested  in the previous post as an alternative to scaling activity by risk factors such as molecular weight or heavy atoms and the approach can be seen as bringing these risk factors and lipophilicity into a common data-analytical framework.   Even if you don’t like the idea of using the residuals, it is still useful to model the measured activity because a slope of unity helps to validate LipE (assuming that you’re using ClogP to model activity).   Even if the slope of the line of fit differs from unity, you can set  l to its value to create a lipophilic efficiency metric that has been tuned to the data set that you wish to analyze.

This is a good point at which to wrap up.  As noted (and reiterated) in the LEM critique, when you use LEMs, you're making assumptions about trends in data and your perception of the system is distorted when these assumptions break down.  Modelling the data by fitting activity to risk factor allows you use the trends actually observed in the data to normalize activity.  That’s just about all I want to say for now and please don’t get me started on LELP

Saturday, 30 August 2014

Ligand efficiency metrics considered harmful

Next >>
It has been a while since I did a proper blog post. Some of you may have encountered a Perspective article entitled, ‘Ligand efficiency metrics considered harmful’ and I’ll post on it because the journal has made the article open access until Sept 14.  The Perspective has already been reviewed by Practical Fragments and highlighted in a F1000 review. Some of the points discussed in the article were actually raised last year in Efficient Voodoo Thermodynamics, Wrong Kind of Free Energy and ClogPalk : a method for predicting alkane/water partition coefficients.  There has been recent debate about the validity of ligand efficiency (LE) which is summarized in a blog post (make sure to look at the comments as well).  However, I believe that both sides missed the essential point which is that the choice (conventionally 1 M) of concentration that is used to define the standard state is entirely arbitrary.  

In this blog post, I’ll focus on what I call ‘scaled’ ligand efficiency metrics (LEMs). Scaling means that a measure of activity or affinity is divided by a risk factor such as molecular weight (MW) or heavy (i.e. non-hydrogen) atoms (HA).   For example, LE can be calculated by dividing the standard free energy of binding (ΔG°) by HA:
LE  =  (1/HA)´RTloge(Kd/C°)

Now you’ll notice that I’ve written ΔG° in terms of the dissociation constant (Kd) and the standard concentration (C°) and I articulated why this is important early last year.  The logarithm function is only defined for numbers (i.e. dimensionless quantities) and, inconveniently, Kd has units of concentration.  This means that LE is a function of both Kd and C° and I’m going to first redefine LE a bit to make the problem a bit easier to see.   I'll use IC50 instead of Kd to define a new LEM which I’m not going to name (in the LEM literature names and definitions keep changing so much better to simply state the relevant formula for whatever is actually used):

(−1/NHA)´log10(IC50/Cref)

I have a number of reasons for defining the metric in this manner and the most important of these is that the new metric is metric is dimensionless.  Note how I use the number of heavy atoms (NHA) rather than HA to define the metric.  Just in case you’re wondering what the difference is, HA for ethanol is 3 heavy atoms while NHA is 3 (lacking units of heavy atoms). [In the original post I'd counted 2 for this molecule but if you check the comments you'll see that this howler was picked up by an alert reader and I has now been corrected.]  Apologies for being pedantic (some might even call me a units Nazi) but if people had paid more attention to units, we’d never have got into this sorry mess in the first place.  The other point to note is that I’ve not converted IC50 to units of energy, mainly for the reason that it is incorrect to do so because an IC50 is not a thermodynamic quantity. However, there are other reasons for not introducing units of energy. Often units of energy go AWOL when values of LE are presented and there is no way of knowing whether the original units were kcal/mol or kJ/mol. Even when energy units are stated explicitly, this might be masking a situation in which affinity and potency measurements have been combined in a single analysis.  Of course, one can be cynical and suggest that the main reason for introducing energy units is to make biochemical measurements appear to be more physical. 

So let’s get back to that new metric and you’ll have noticed a quantity in the defining equation that I’ve called Cref (reference concentration).  This is similar to the standard concentration in that the choice of its value is completely arbitrary but it is also different because it has no thermodynamic significance. You need to use it when defining the new LEM because, at the risk of appearing repetitive, you can’t calculate a logarithm for a quantity that has units.  Another way of thinking about Cref is as an arbitrary unit of concentration that we’re using to analyze some potency measurements.  Something that is really, really important and fundamental in science is that your perception of a system should not change when you change the units of the quantities that describe the system.  If it does then, in the words of Pauli, you are “not even wrong” and you should avoid claiming penetrating insight so as to prevent embarrassment later. So let’s take a look at how changing Cref affects our perception of ligand efficiency.  The table below is essentially the same as what was given in the Perspective article (the only difference is that I’m using IC50 in the blog post rather than Kd).  I also should point out that Huan-Xiang Zhou and Mike Gilson made a similar criticism of LE back in 2009 (although we cited their article in the context of standard states, we failed to notice the critique at the end of their article and there really is no excuse for having missed it).  When a reference concentration of 1 M is used, the three compounds are all equally ligand efficient according to the new LEM.  If Cref = 0.1 M, the compounds appear to become more ligand efficient as molecular size increases but the opposite behavior is observed for Cref = 10 M.


Here’s a figure that shows the problem from a different angle. See how the three parallel lines respond the scaling transformation (Y=> Y/X).  The line that passes through the origin transforms to a line of zero slope (Y/X is independent of X) while the other two lines transform to curves (Y/X is dependent on X).  This graphic has important implications for fit quality (FQ) because it shows that some of the size dependency of LE is a consequence of the (arbitrary) choice of a value (usually 1 M) for Cref or C°.

 

A common response that I’ve encountered when raising these points is that LE is still useful.  My counter-response is that Religion is also useful (e.g. as a means for pastors to fleece their flocks efficiently) but that, by itself, neither makes it correct nor ensures that that it is being used correctly (if that is even possible for Religion).  One occasionally expressed opinion is that, provided you use a single value for Cref or C°, the resulting LE values will be consistent.  However, consistency is no guarantee of correctness and we need to remember that we create metrics for the purpose of measuring things with them.  When you advocate use of a metric in drug discovery, the burden of proof is on you to demonstrate that the metric has a sound scientific basis and actually measures what it supposed to measure.
The Perspective is critical of the LEMs used in drug discovery but it does suggest alternatives that do not suffer from the same deficiencies.  This is a good point to admit that it took me a while to figure out what was wrong with LE and I’ll point you towards a blog post from over five years ago that will give you an idea about how my position on LEMs has evolved.  It is often stated that LEMs normalize activity with respect to risk factor although it is rarely, if ever, stated explicitly what is meant by the term ‘normalize’.   One way of thinking about normalization is as a way to account for the contribution of a risk factor, such as molecular size or lipophilicity, to activity.  You can do this by modelling the activity as a function of risk factor and using the residual as a measure of activity that has been corrected for the contribution made by the risk factor in question.  You can also think of an LEM as a measure of the extent to which the activity of a compound beats a trend (e.g. linear response of activity to HA). If you’re going to do this then why not use the trend actually observed in the data rather than some arbitrarily assumed trend?

Although I still prefer to use residuals to quantify extent to which activity beats a trend, the process of modelling activity measurements hints at a way that LE might be rehabilitated to some extent.  When you fit a line to activity (or affinity) measurements, you are effectively determining the value of Cref (or C°) that will make the line of fit pass through the origin and which you can then use to redefine activity (or affinity) for the purpose of calculating LE. I would argue that the residual is a better measure of the extent to which the activity of a compound beats the trend in the data because the residual has sign and the uncertainty in it does not depend explicitly on the value of a scaling variable such as HA. However, LE defined in this manner can at least be claimed to take account of the observed trend in the activity data. Something that you might want to think about in this context is whether or not you'd expect the same parameters (slope and intercept) if you were to fit activity measured against different targets to MW or HA.  My reason for bring this up is that it has implications for the validity of mixing results from different assays in LE-based analyses so see what you think.   

I’ll wrap up by directing you to a presentation that I've been doing lately and it includes material from the earlier correlation inflation study.  A point that I make when presenting this material is that, if we do bad science and bad data analysis, people can be forgiven for thinking that the difficulties in drug discovery may actually be of our own making.