Sunday, 12 June 2016

PAINS: a question of applicability domain?

<< previous || next >>

As most readers of this blog will know, analysis of large (often proprietary) data sets is very much a part of modern drug discovery. Some will have discerned a tendency for the importance of these studies to get 'talked up' and the proprietary nature of many of the data sets makes it difficult to challenge published claims. There are two ways in which data analysis studies in the drug discovery literature get 'talked up'. Firstly, trends in data are made to look stronger than they actually are and this has been discussed. Secondly, it may be suggested that the applicability domain for an analysis is broader than it actually is.

So it's back to PAINS with the fifth installment in the series ( 1 | 2 | 3 | 4 ) and, if you've found reading these posts tedious, spare a thought for the unfortunate person who has to write them. In the two posts on PAINS that will follow this one, I'll explore how PAINS have become integrated into journal guidelines for authors before concluding the series with some suggestions about how we might move things forward. But before doing this, I do need to take another look at the Nature PAINS article (Chemical con artists foil drug discovery) that was discussed in the first post of the series. I will refer this article as BW2014 in this post. I'll use the term 'pathological' as a catch all term in this post to describe any behavior by compounds in assays that results in an inappropriate assessment of the activity of those compounds.   

BW2014 received a somewhat genuflectory review in a Practical Fragments post. You can see from the comments on the post that I was becoming uneasy about the size and 'homogeneity' of the PAINS assay panel although it was a rather intemperate PAINS-shaming post a couple of months later that goaded me into taking a more forensic look at the field. I'd like to get a few things straight before I get going. It has been known from the mid-1990s that not all high-throughput screening (HTS) output smells of roses and the challenge has been establishing by experiment that suspect compounds are indeed behaving pathologically. When working up HTS output, we typically have to make decisions based on incomplete information. One question that I'd like you think about is how would knowing that a catechol matched a PAINS substructure change your perception of that compound as a hit from HTS?

So before I go on it is perhaps a good idea to say what is meant the term 'PAINS' which is an acronym for Pan Assay INterference compoundS. In the literature and blogs, the term 'PAINS' appears to mean one of the following:

1) Compounds matching substructural patterns disclosed in the original PAINS study
2) Compounds that have been demonstrated by experiment to behave pathologically in screening
3) Substructural definitions such as, but not necessarily, those described in the original PAINS article, claimed to be predictive of pathological behavior in screening
4) Compounds that matching substructural definitions such as, but not necessarily, those described in the original PAINS article
5) Compounds (or classes of compounds) believed to have the potential to behave pathologically in screens.

There is still some ambiguity within the categories and, in the original PAINS study, PAINS are identified by frequent-hitter behavior in an assay panel. Do you think that is justified to label compounds that fail to hit a single assay in the panel as PAINS simply because they share substructural elements with frequent-hitters? Category 5 is especially problematic because it can be difficult to know if those denouncing a class of compounds as PAINS are doing so on the basis of relevant experimental observations, model-based prediction or 'expert' opinion. I'd guess that those doing the denouncing often don't know either. Drug discovery suffers from blurring of what has been measured with what has been opined and this post should give you a better idea of what I'm getting at here.

This is a good point to summarize the original PAINS study. Compounds were identified as PAINS on the basis of frequent-hitter behavior in a panel of six AlphaScreen assays for inhibition of protein-protein interactions. The results of the study were a set of substructural patterns and a summary of the frequent hitter associated with each pattern. The original PAINS study invokes literature studies and four instances of  'personal communication' in support of the claim that PAINS filters are predictive of pathological behavior in screening although, in the data analysis context, this 'evidence' should be regarded as anecdotal and circumstantial. Neither chemical structures nor assay data were disclosed in the original PAINS study and the data must be regarded as proprietary.

The PAINS substructural patterns would certainly be useful to anybody using AlphaScreen. My criticism of the 'PAINS field' is not of the substructural patterns themselves (or indeed of attempts to identify compounds likely to behave pathologically when screened) but of the manner in which they are extrapolated out of their applicability domain. I would regard interpreting frequent-hitter behavior in a panel of six AlphaScreen assays as pan-assay interference as a significant extrapolation?

But I have droned on enough so now let's take a look at some what BW2014 has to say:

"Artefacts have subversive reactivity that masquerades as drug-like binding and yields false signals across a variety of assays [1,2]. These molecules — pan-assay interference compounds, or PAINS — have defined structures, covering several classes of compound (see ‘Worst offenders)."

I don't think that it is correct to equate artefacts with reactivity since compounds that absorb or fluoresce strongly or that quench fluorescence can all interfere with assays without actually reacting with anything. My bigger issue with this statement is claiming "a variety of assays" when the PAINS assay panel consisted of six AlphaScreen assays. Strictly, we should be applying the term 'artefact' to assay results rather than compounds but that would be nitpicking. Let's continue from BW2014:

"In a typical academic screening library, some 5–12% of compounds are PAINS [1]."

Do these figures reflect actual analysis on real academic screening libraries? Have these PAINS actually been observed to behave pathologically in real assays or are they simply been predicted to behave badly? Does the analysis take account of the different PAIN levels associated with different  PAINS substructures?  Continuing from BW2014:

“Most PAINS function as reactive chemicals rather than discriminating drugs. They give false readouts in a variety of ways. Some are fluorescent or strongly coloured. In certain assays, they give a positive signal even when no protein is present. Other compounds can trap the toxic or reactive metals used to synthesize molecules in a screening library or used as reagents in assays.”

“PAINS often interfere with many other proteins as well as the one intended."

At the risk of appearing repetitive, it is not clear exactly what is meant by the term 'PAINS' here. How many compounds identified as PAINS in the original study were actually shown by experiment to function as "reactive chemicals" under assay conditions? How many compounds identified as PAINS in the original study were actually shown to "interfere with many other proteins"? How many compounds identified as PAINS in the original study were actually shown to interact with even one of the proteins used in the PAINS assay panel? This would have been a good point to have mentioned that singlet oxygen quenchers and scavengers can interfere with the AlphaScreen detection used in all six assays of the original PAINS assay panel.

BW2014 offers some advice on PAINS-proof drug discovery and I'll make the observation that there is an element of 'do as I say, not as I do' to some of this advice. BW2014 suggests: 

“Scan compounds for functional groups that could have reactions with, rather than affinity for, proteins.”

You should always be concerned about potential electrophilicity of screening hits (I had two 'levels' of electron-withdrawing group typed as SMARTS vector bindings in my Pharma days although I accept that may have been a bit obsessive) but you also need to be aware that covalent bond formation between protein and ligand is a perfectly acceptable way to engage targets. 

The following advice from BW2014 is certainly sound:

Check the literature. Search by both chemical similarity and substructure to see if a hit interacts with unrelated proteins or has been implicated in non-drug-like mechanisms.”

This is a good point to mention that singlet oxygen quenchers and scavengers can interfere with the AlphaScreen detection used in the six assays of the original PAINS assay panel. I realize it is somewhat uncouth to say so but the original PAINS study didn't exactly scour the literature on quenchers and scavenger of singlet oxygen.  For example DABCO is described as a "strong singlet oxygen quencher" without any supporting references. 

BW2014 makes this recommendation:

"Assess assays. For each hit, conduct at least one assay that detects activity with a different readout. Be wary of compounds that do not show activity in both assays. If possible, assess binding directly, with a technique such as surface plasmon resonance."

Again this makes a lot of sense and I would add that sometimes pathological behavior of compounds in assays can be discerned by looking at the concentration response of signal. Direct (i.e. label-free) quantification is particularly valuable and surface plasmon resonance can also characterize binding stoichiometry which can be diagnostic of pathological behavior in screens. However, the above advice begs the question why a panel of six assays with the same readout was chosen for a study of pan assay interference.   

I'll finish off with some questions that I'd like you to think about. Would you consider a compounds hitting all assays in a panel composed of six AlphaScreen assays to constitute evidence for pan assay interference by that compound? Given the results from 40 HTS campaigns, how would you design a study to characterize pan-assay interference? How would knowing that a catechol was an efficient quencher of singlet oxygen change your perception of that compound as a hit from HTS?

So now that I've distracted you with some questions, I'm going to try to slip away unnoticed. In the next PAINS post, I'll be taking a close look at how PAINS have found their way into the J Med Chem guidelines for authors. Before that, I'll try to entertain you with some lighter fare. Please stay tuned for Confessions of a Units Nazi... 

Friday, 3 June 2016

Yet more on ligand efficiency metrics

In this post, I'll be responding to a couple of articles in the literature that cited our gentle critique of ligand efficiency metrics (LEMs). The critique has also been distilled into a harangue and  readers may find that a bit more digestible that the article. As we all know, Ligand Efficiency (LE), the original LEM was introduced to normalize affinity with respect to molecular size. Before getting started, I'd like to ask you, the reader, to ask yourself exactly what you take to mean by the term 'normalize'.

The first article which I'll call L2016 states:

Optimisation frequently increases molecular size, and on average there is a trade-off between potency and size gains, leading to little or no gain in LE [42,52] but an increase in SILE [52]. This, and the nonlinear dependence of LE on heavy atom count, together with thermodynamic considerations, has led some authors to question the validity of LE [76,77], while others support its use [52,78,79].

This statement is misleading because the "thermodynamic considerations" are that our perception of efficiency changes when we change the concentration units in which affinity and potency are expressed.  As such, LE is a physicochemically meaningless quantity and, in any case, references 52 and 78 precede our challenge to the thermodynamic validity of LE (although not an equivalent challenge in 2009). Reference 78 uses a mathematically invalid formula for LE when claiming to have shown that LE is mathematically valid and reference 79 creates much noise while evading the challenge. I have responded to reference 79 (aka the 'sound and fury article') in two blog posts ( 1 | 2 ).

This is a good place for a graphic to break up the text a bit and I'll use the table (pulled from an earlier post) that shows how our perception of ligand efficiency changes with the concentration units used to define affinity. I've used base 10 logarithms and dispensed with energy units (which are often discarded) to redefine LE as generalized LE (GLE) so that we can explore the effect of changing the concentration unit (which I've called a reference concentration). Please take special of note how a change in concentration unit can change your perception of efficiency for the three compounds. Do you think it makes sense to try to 'correct' LE for the effects of molecular size? 

Another article also cites our LEM critique.  Let's take a look at how the study, which I'll call M2016, responds to our criticism of LE (reference 69 in this study):

The appeal of LE and GE is in the convenience and rapidity with which these factors can be assessed during lead optimization, but the simplistic nature of these metrics requires an understanding of, and appreciation for, their inherent limitations when interpreting data.[67,68,69,70The relevance of LE as a metric has been challenged based on the lack of direct proportionality to molecular size and an inconsistency of the magnitude of effect between homologous series, both attributed to a fundamental invalidity underlying its mathematical derivation.[65,67] These criticisms have stimulated considerable discussion and provoked discourse that attempts to moderate the perspective and provide guidance on how to use LE and GE as rule-of-thumb metrics in lead optimization.[68,69,70]

To be blunt, I don't think that the M2016 study does actually respond to our criticism of LE as a metric which is that our perception of efficiency changes when we change the concentration unit with which we specify affinity or potency. This is an alarming characteristic for something that is presented as a tool for decision making and, if it were a navigational instrument, we'd be talking about fundamental design flaws rather than "limitations". The choice of 1 M is entirely arbitrary and selecting a particular concentration unit for calculation of LE places the burden of proof on those making the selection to demonstrate that this particular concentration unit is indeed the one that is most fit for purpose. 

The other class of LEM that is commonly encountered is exemplified by what is probably best termed lipophilic efficiency (LipE).  Although the term LLE is more often used, there appears to be some confusion as to whether this should be taken to mean ligand-lipophilicity efficiency or lipophilic ligand efficiency so it's probably safest to use LipE. Let's see what the M2016 study has to say about LipE:

LLE is an offsetting metric that reflects the difference in the affinity of a drug for its target versus water compared to the distribution of the drug between octanol and water, which is a measure of nonspecific lipophilic association.[69,12]

If I knew very little about LEMs, I would find this sentence a bit confusing although I think that it is essentially correct. We used (and possibly even introduced) the term 'offset' in the LEM context to describe metrics that are defined by subtracting risk factor from affinity (or potency). This is in contrast to LE and its variations which are defined by dividing affinity (or potency) by molecular size and can be described as scaled. There is still an arbitrary aspect to LipE in that we could ask whether (pIC50 - 0.5 ´ logP) might not be a better metric than  (pIC50 - logP).  Unlike LE, however, LipE is a quantity that actually has some physicochemical meaning, provided that the compound in question binds to its target in an uncharged form. Specifically, LipE can be considered to quantify the ease (or difficulty) of moving the compound from octanol to its binding site in the target as shown in the figure below:

Let's see what M2016 study has to say:

However, care needs to be exercised in applying this metric since it is dependent on the ionization state of a molecule, and either Log P or Log D should be used when appropriate.

This statement fails to acknowledge a third option which is that there may be situations in which  neither logP nor logD is appropriate for defining LipE. One such situation is when the compound binds to its target in a charged form. When this is the case, neither logP nor logD quantifies the ease (or difficulty) of moving the bound form of compound from octanol to water. As an aside, using logD to quantify compound quality suggests that increasing the extent of ionization will lead to better compounds and I hope that readers will see that this is a strategy that is likely to end in tears.

Let's take a look at LEMs from the perspective of folk who are working in lead optimization projects or doing hit-to-lead work. Merely questioning the value of LEMs is likely incur the wrath of Mothers Against Molecular Obesity (MAMO) so I'll stress that I'm not denying that excessive lipophilicity and  molecular size are undesirable. We even called them "risk factors" in our LEM critique. That said, in the compound quality and drug-likeness literature, it is much more common to read that X and Y are correlated, associated or linked than to actually be shown how strong the correlation, association or linkage is. When you do get shown the relationship between X and Y, it's usually all smoke and mirrors (e.g. graphics colored in lurid, traffic light hues). When reading M2016 you might be asking why can't we see the relationship between PFI and aqueous solubility presented more directly (or even why iPFI is preferred over PFI for hERG and promiscuity). A plot of one against the other perhaps even a correlation coefficient? Is it really too much to ask?

The reason for the smoke and mirrors is that the correlations are probably weak. Does this mean that we don't need to worry about risk factors like molecular size and lipophilicity? No, it most definitely does not! "You speak in more riddles than a Lean Six Sigma belt", I hear you say, "and you tell us that the correlations with the risk factors have been smoked and mirrored and yet we still need to worry about the risk factors".  Patience, dear reader, because the apparent paradox can be resolved once you realize some much stronger local correlations may be lurking beneath an anemic global correlation. What this means is that potencies of compounds in different projects (and different chemical series in the same project) may respond differently to risk factors like lipophilicity and molecular size. You need to start thinking of each LO project as special (although 'unique' might be a better term because 'special projects' were what used to happen to senior managers at ICI before they were put out to pasture).  

Another view of LEMs is that they represent reference lines. For example, we can plot potency against molecular size and draw a line with positive slope from a point corresponding to a 1 M IC50 on the potency axis and say that all points on the line correspond to the same LE. Analogously, we can draw a line of unit slope on a plot of pIC50 against logP and say that all points on the line correspond to the same LipE.  You might be thinking that these reference lines are a bit arbitrary and you'd be thinking along the right lines. The intercept on the potency axis is entirely arbitrary and that was the basis of our criticism of LE. A stronger case can be made for considering  a line of unit slope on a plot of pIC50 against logP to represent constant LipE but only if the compounds bind in uncharged forms to their target.

Let's get back to that project you're working on and let's suppose that you want to manage risk factors like lipophilicity and molecular size. Before you calculate all those LEMs for your project compounds, I'd like you to plot pIC50 against molecular size (it actually doesn't matter too much what measure of molecular size you use). What you now have in front of you is the response of potency to molecular size.  Do you see any correlation between pIC50 and molecular size? Why not try fitting a straight line to your data to get an idea of the strength of the correlation? The points that lie above the line of fit beat the trend in the data and the points that lie below the line are beaten by the trend. The residual for a point is simply the distance above the line for that point and its value tells you how much the activity that it represents beats the trend in the data. Are there structural features that might explain why some points are relatively distant from the line that you've fit? In case you hadn't realized it, you've just normalized your data. Vorsprung durch technik! Here's a graphic to give you an idea how this might work. 

The relationship between affinity and molecular size shown in the plot above is likely to be a lot tighter than what you'll see for a typical project. In the early stages of a project, the range in activity for the project compounds will often be too narrow for the response of activity to risk factor to be discerned. You can make assumptions about the response of affinity (or potency) to risk factor (e.g. that LipE will remain constant during optimization) in order to forecast outcome but it's really important to continually monitor the response of activity to risk factor to check that your assumptions still hold. If affinity (or potency) is strongly correlated with risk factor then you want the response to risk factor to be as steep as possible. Could this be something to think about when trying to prioritize between series?  

So it's been a long post and there are only so many metrics that one can take in a day. If you want to base your decisions on metrics that cause your perception to change with units then as consenting adults you are free to do so (just as you are free to use astrological charts or to seek the face of a deity in clouds). A former prime minister of India drank a glass of his own urine every day and lived to 98. Who would have predicted that? Our LEM critique was entitled 'Ligand efficiency metrics considered harmful' and I now I need to say why. When doing property-based design, it is vital to get as full an understanding as possible of the response of affinity (or potency) to each of the properties in which you're interested. If exploring the relationship between X and Y, it is generally best to analyse the data as directly as possible and to keep X and Y separate (as opposed to looking at the response of a function of Y and X to X). When you use LEMs you're also making assumptions about the response of Y to X and you need to ask yourself whether that's a sensible way to explore the response of Y to X. If you want to normalize potency by risk factor, would you prefer to use the trend that you've actually observed in your data or an arbitrary trend that 'experts' recommend on the basis that it's "simple"?

Next week, PAINS...

Sunday, 22 May 2016

Sailor Malan's guide to fragment screening library design

Today I'll take a look at a JMC Perspective on design principles for fragment libraries that is intended to provide advice for academics. When selecting compounds to be assayed the general process typically consists of two steps. First, you identify regions of chemical space that you hope will be relevant and then you sample these regions. This applies whether you're designing a fragment library, performing a virtual screen or selecting analogs of active compounds with which to develop structure-activity relationships (SAR). Design of compound libraries for fragment screening has actually been discussed extensively in the literature and the following selection of articles, some of which are devoted to the topic, may be useful: Fejzo (1999), Baurin (2004), Mercier (2005), Schuffenhauer (2005), Albert (2007) Blomberg (2009), Chen (2009), Law (2009), Lau (2011), Schulz (2011); Morley (2013). This series of blog posts ( 1 | 2 | 3 | 4) on fragment screening library design that may also be helpful.

The Perspective opens with the following quote:

"Rules are for the obedience of fools and the guidance of wise men"

Harry Day, Royal Air Force (1898-1977)

It wasn't exactly clear what the authors are getting at here since there appears to be no provision for wise women. Also it is not clear how the authors would view rules that required darker complexioned individuals to sit at the backs of buses (or that swarthy economists should not solve differential equations on planes). That said, the quote hands me a legitimate excuse to link Malan's Ten Rules for Air Fighting and I will demonstrate that the authors of this Perspective can learn much from the wise teachings of 'Sailor' Malan.

My first criticism of this Perspective is that the authors devote an inordinate amount of space to topics that are irrelevant from the viewpoint of selecting compounds for fragment screening. Whatever your views on the value of ligand efficiency metrics and thermodynamic signatures, these are things that you think about once you've got the screening results. The authors assert, "As a result, fragment hits form high-quality interactions with the target, usually a protein, despite being weak in potency" and some readers might consider the 'concept' of high-quality interactions to be pseudoscientific psychobabble on par with homeopathy, chemical-free food and the wrong type of snow. That said, discussion of some of these peripheral topics would have been more acceptable if the authors had articulated the library design problem clearly and discussed the most relevant literature early on. By straying from their stated objective, the authors have broken the second of Malan's rules ("Whilst shooting think of nothing else, brace the whole of your body: have both hands on the stick: concentrate on your ring sight").

The section on design principles for fragment libraries opens with a slightly gushing account of the Rule of 3 (Ro3). This is unfortunate because this would have been the best place for the authors to define the fragment library design problem and review the extensive literature on the subject. Ro3 was originally stated in a short communication and the analysis that forms its basis is not shared. As an aside, you need to be wary of rules like these because the cutoffs and thresholds may have been imposed arbitrarily by those analyzing the data. For example, the GSK 4/400 rule actually reflects the scheme used to categorize continuous data and it could just have easily been the GSK 3.75/412 rule if the data had been pre-processed differently. I have written a couple ( 1 | 2 ) of blog posts on Ro3 but I'll comment here so as to keep this post as self-contained as possible. In my view, Ro3 is a crude attempt to appeal to the herding instinct of drug discovery scientists by milking a sacred cow (Ro5). The uncertainties in hydrogen bond acceptor definitions and logP prediction algorithms mean that nobody knows exactly how others have applied Ro3. It also is somewhat ironic that the first article referenced by this Perspective actually states Ro3 incorrectly. If we assume that Ro5 hydrogen bond acceptor definitions are being used then Ro3 would appear to be an excellent way to ensure that potentially interesting acidic species such as tetrazoles and acylsulfonamides are excluded from fragment screening libraries. While this might not be too much of an issue if identification of adenine mimics is your principal raison d'etre, some researchers may wish to take a broader view of the scope of FBDD. It is even possible that rigid adherence to Ro3 may have led to the fragment starting points for this project being discovered in Gothenburg rather than Cambridge. Although it is difficult to make an objective assessment of the impact of Ro3 on industrial FBDD, its publication did prove to be manna from heaven for vendors of compounds who could now flog milligram quantities of samples that had previously been gathering dust in stock rooms.

This is a good point to see what 'Sailor' Malan might have made of this article. While dropping Ro3 propaganda leaflets, you broke rule 7 (Never fly straight and level for more than 30 seconds in the combat area) and provided an easy opportunity for an opponent to validate rule 10 (Go in quickly - Punch hard - Get out). Faster than you can say "thought leader" you've been bounced by an Me 109 flying out of the sun. A short, accurate (and ligand-efficient) burst leaves you pondering the lipophilicity of the mixture of glycol and oil that now obscures your windscreen. The good news is that you have been bettered by a top ace whose h index is quite a bit higher than yours. The bad news is that your cockpit canopy is stuck. "Spring chicken to shitehawk in one easy lesson."

Of course, there's a lot more to fragment screening library design than counting hydrogen bonding groups and setting cutoffs for molecular weight and predicted logP. Molecular complexity is one of the most important considerations when selecting compounds (fragments or otherwise) and anybody even contemplating compound library design needs to understand the model introduced by Hann and colleagues. This molecular complexity model is conceptually very important but it is not really a practical tool for selecting compounds. However, there are other ways to define molecular complexity in ways that allow the general concept to be distilled into usable compound selection criteria. For example, I've used restriction of extent of substitution (as detailed in this article) to control complexity and this can be achieved using SMARTS notation to impose substructural requirements. The thinking here is actually very close to the philosophy behind 'needle screening' which was first described in 2000 by researchers at Roche although they didn't actually use the term 'molecular complexity'.

As one would expect, the purging of unwholesome compounds such as PAINS is discussed. The PAINS field suffers from ambiguity, extrapolation and convolution of fact with opinion. This series ( 1 | 2 | 3 | 4) of blog posts will give you a better idea of my concerns. I say "ambiguity" because it's really difficult to know whether the basis for labeling a compound as a PAIN (or should that be a PAINS) is experimental observation, model-based prediction or opinion. I say "extrapolation" because the original PAINS study equates PAIN with frequent-hitter behavior in a panel of six AlphaScreen assays and this is extrapolated to pan-assay (which many would take to mean different types of assays) interference. There also seems to be a tendency to extrapolate the frequent-hitter behavior in the AlphaScreen panel to reactivity with protein although I am not aware that any of the compounds identified as PAINS in the original study were shown to react with any of the proteins in the AlphaScreen panel used in that study. This is a good point to include a graphic to break the text up a bit and, given an underlying theme of this post, I'll use this picture of a diving Stuka.

One view of the fragment screening mission is that we are trying to present diverse molecular recognition elements to targets of interest. In the context of screening library design, we tend to think of molecular recognition in terms of pharmacophores, shapes and scaffolds. Although you do need to keep lipophilicity and molecular size under tight control, the case can be made for including compounds that would usually be considered to be beyond norms of molecular good taste. In a fragment screening situation I would typically want to be in a position to present molecular recognition elements like naphthalene, biphenyl, adamantane and (especially after my time at CSIRO) cubane to target proteins. Keeping an eye on both molecular complexity and aqueous solubility, I'd select compounds with a single (probably cationic) substituent and I'd not let rules get in the way of molecular recognition criteria. In some ways compound selections like those above can be seen as compliance with Rule 8 (When diving to attack always leave a proportion of your formation above to act as top guard). However, I need to say something about sampling chemical space in order to make that connection a bit clearer.

This is a good point for another graphic and it's fair to say that the Stuka and the B-52 differed somewhat in their approaches to target engagement. The B-52 below is not in the best state of repair and, given that I took the photo in Hanoi, this is perhaps not totally surprising. The key to library design is coverage and former bombardier Joseph Heller makes an insightful comment on this topic. One wonders what First Lieutenant Minderbinder would have made of the licensing deals and mergers that make the pharma/biotech industry such an exciting place to work.  

The following graphic, pulled from an old post, illustrates coverage (and diversity) from the perspective of somebody designing a screening library.  Although I've shown the compounds in a 2 dimensional space, sampling is often done using molecular similarity which we can think of inversely related to distance. A high degree of molecular similarity between two compounds indicates that their molecular structures are nearby in chemical space.  This is a distance-geometric view of chemical space in which we know the relative positions of molecular structures but not where they are.  When we describe a selection of molecular structures as diverse, we're saying that the two most similar ones are relatively distant from each other. The primary objective of screening library design is to cover relevant chemical space as effectively as possible and devil is in the details like 'relevant' and 'effectively'. The stars in the graphic below show molecular structures that have been selected to cover the chemical space shown. When representing a number of molecular structures by a single molecular structure it is important, as it is in politics, that what is representative not be too distant from what is being represented. You might ask, "how far is acceptable?" and my response would be, as it often is in Brazil, "boa pergunta". One problem is scaffolds differ in their 'contributions' to molecular similarity and activity cliffs usually provide a welcome antidote to the hubris of the library designer.         

I would argue that property distributions are more important than cutoff values for properties and it is during the sampling phase of library design that these distributions are shaped. One way of controlling distributions is to first define regions of chemical space using progressively less restrictive selection criteria and then sample these in order, starting with the most restrictively defined region. However, this is not the only way to sample and might also try to weight fragment selection using desirability functions. Obviously, I'm not going to provide a comprehensive review of chemical space sampling in a couple of paragraphs of a blog post but I hope to have shown that the sampling of chemical space is an important aspect of fragment screening library design. I also hope to have shown that failing to address the issue of sampling relevant chemical space represents a serious deficiency of the featured Perspective

The Perspective concludes with a number of recommendations and I'll conclude the post with comments on some of these. I wouldn't have too much of a problem with the proposed 9 - 16 heavy atom range as a guideline although I would consider a requirement that predicted octanol/water logP be in the range 0.0 - 2.0 to be overly restrictive. It would have been useful for the authors to say how they arrived at these figures and I invite all of them to think very carefully about exactly what they mean by "cLogP" and "freely rotatable bonds" so we don't have a repeat of the Ro3 farce. There are many devils in the details of the statement:"avoid compounds/functional groups known to be associated with high reactivity, aggregation in solution, or false positives". My response to "known" is that it is not always easy to distinguish knowledge from opinion and "associated" (like correlated) is not a simple yes/no thing. It is not cleat how "synthetically accessible vectors for fragment growth" should be defined since there is also a conformational stability issue if bonds to hydrogen are regarded as growth vectors.   

This is a good point at which to wrap things up and I'd like to share some more of Sailor Malan's wisdom before I go. The first rule (Wait until you see the whites of his eyes. Fire short bursts of 1 to 2 seconds and only when your sights are definitely 'ON') is my personal favorite and it provides excellent, practical advice for anybody reviewing the scientific literature. I'll leave you with a short video in which a pre-Jackal Edward Fox displays marksmanship and escaping skills that would have served him well in the later movie. At the start of the video, the chemists and biologists have been bickering (of course, this never really happens in real life) and the VP for biophysics is trying to get them to toe the line. Then one of the biologists asks the VP for biophysics if they can do some phenotypic screening and you'll need to watch the video to see what happens next...

Sunday, 8 May 2016

A real world perspective on molecular design

I'll be taking a look at a Real-World Perspective on Molecular Design which has already been reviewed by Ash. I don't agree that this study can accurately be described as 'prospective' although, in fairness, it is actually very difficult to publish molecular design work in a genuinely prospective manner. Another point to keep in mind is that molecular modelers (like everybody else in drug discovery) are under pressure to demonstrate that they are making vital contributions. Let's take a look at what the authors have to say:

"The term “molecular design” is intimately linked to the widely accepted concept of the design cycle, which implies that drug discovery is a process of directed evolution (Figure 1). The cycle may be subdivided into the two experimental sections of synthesis and testing, and one conceptual phase. This conceptual phase begins with data analysis and ends with decisions on the next round of compounds to be synthesized. What happens between analysis and decision making is rather ill-defined. We will call this the design phase. In any actual project, the design phase is a multifaceted process, combining information on status and goals of the project, prior knowledge, personal experience, elements of creativity and critical filtering, and practical planning. The task of molecular design, as we understand it, is to turn this complex process into an explicit, rational and traceable one, to the extent possible. The two key criteria of utility for any molecular design approach are that they should lead to experimentally testable predictions and that whether or not these predictions turn out to be correct in the end, the experimental result adds to the understanding of the optimization space available, thus improving chances of correct prediction in an iterative manner. The primary deliverable of molecular design is an idea [4] and success is a meaningful contribution to improved compounds that interrogate a biological system."

This is a certainly a useful study although I will make some criticisms in the hope that doing so stimulates discussion. I found the quoted section to lack coherence and would argue that  the design cycle is actually more of a logistic construct than a conceptual one. That said, I have to admit that it's not easy to clearly articulate what is meant by the term 'molecular design'. One definition of molecular design is control of behavior of compounds and materials by manipulation of molecular properties. Using the term 'behavior' captures the idea that we design compounds to 'do' rather than merely to 'be'. I also find it useful to draw a distinction between hypothesis-driven molecular design (ask good questions) and prediction-driven molecular design (synthesize what the models, metrics or tea leaves tell you to). Asking good questions is not as easy as it sounds because it it is not generally possibly to perform controlled experiments in the context of molecular design as discussed in another post from Ash. Hypothesis-driven molecular design can also be thought of as a framework in which to efficiently obtain the information required to make decisions and, in this sense, there are analogies with statistical molecular designI believe that the molecular design that the authors describe in the quoted section is of the hypothesis-driven variety but hand-wringing about how "ill-defined" it is doesn't really help move things forward. The principal challenges for hypothesis-driven molecular design are to make it more objective, systematic and efficient. I'll refer you to a trio of blog posts ( 1 | 2 | 3) in which some of this is discussed in more detail.

I'll not say anything specific about the case studies presented in this study except to note that sharing specific examples of application of  molecular design as case studies does help to move the field forward even when the studies are incomplete. The examples do illustrate how the computational tools and structural databases can be used to provide a richer understanding of molecular properties such as conformational preferences and interaction potential. The CSD (Cambridge Structural Database) is a particularly powerful tool and, even in my Zeneca days, I used to push hard to get medicinal chemists using it. Something that we in the medicinal chemistry community might think about is how incomplete studies can be published so that specific learning points can be shared widely in a timely manner.  

But now I'd like to move on to the conclusions, starting with 1 (value of quantitative statements), The authors note:

"Frequently, a single new idea or a pointer in a new direction is sufficient guidance for a project team. Most project impact comes from qualitative work, from sharing an insight or a hypothesis rather than a calculated number or a priority order. The importance of this observation cannot be overrated in a field that has invested enormously in quantitative prediction methods. We believe that quantitative prediction alone is a misleading mission statement for molecular design. Computational tools, by their very nature, do of course produce numerical results, but these should never be used as such. Instead, any ranked list should be seen as raw input for further assessment within the context of the project. This principle can be applied very broadly and beyond the question of binding affinity prediction, for example, when choosing classification rather than regression models in property prediction."
This may be uncomfortable reading for QSAR advocates, metric touts and those who would have you believe that they are going to disrupt drug discovery by putting cheminformatics apps on your phone. It also is close to my view of the role of computational chemistry in molecular design (the observant reader will have noticed that I didn't equate the two activities) although, in the interests of balance, I'll refer you to a review article on predictive modelling. We also need to acknowledge that predictive capability will continue to improve (although pure prediction-driven pharmaceutical design is likely to be at least a couple of decades away) and readers might find this blog post to be relevant. 

Let's take a look at conclusion 5 (Staying close to experiment) and the authors note:

"One way of keeping things as simple as possible is to preferentially utilize experimental data that may support a project, wherever this is meaningful. This may be done in many different ways: by referring to measured parameters instead of calculated ones or by utilizing existing chemical building blocks instead of designing new ones or by making full use of known ligands and SAR or related protein structures. Rational drug design has a lot to do with clever recycling."

This makes a lot of sense although I don't recommend use of the tautological term 'rational drug design' (has anybody ever done irrational drug design?). What they're effectively saying here is that it is easier to predict the effect of structural changes on properties of compounds than it is to predict those properties directly from molecular structure. The implications of this for cheminformaticians (and others seeking to predict behaviour of compounds) is that they need to look at activity and chemical properties in terms of relationships between the molecular structures of compounds. I've explored this theme, both in an article and a blog post, although I should point out that there is a very long history of associating changes in the values of properties of compounds with modifications to molecular structures.

However, there is another side to "staying close to experiment" and that is recognizing what is and what isn't an experimental observable. The authors are clearly aware of this point when they state: 

"MD trajectories cannot be validated experimentally, so extra effort is required to link such simulation results back to truly testable hypotheses, for example, in the qualitative prediction of mechanisms or protein movements that may be exploited for the design of binders."

When interpreting structures of protein-ligand complexes, it is important to remember that the contribution of an intermolecular contact to affinity is not, in general, an experimental observable. As such, it would have been helpful if the authors had been a bit more explicit about exactly which experimental observable(s) form the basis of the "Scorpion network analysis of favorable interactions". The authors make a couple of references to ligand efficiency and I do need to point out that scaling free energy of binding has no thermodynamic basis because, in general, our perception of efficiency changes with the concentration used to define the standard state. On a lighter note there is a connection between ligand efficiency and homeopathy that anybody writing about molecular design might care to ponder and that's where I'll leave things.

Friday, 1 April 2016

LELP metric validated

So I was wrong all along about LELP.  

I now have to concede that the LELP metric actually represents a seminal contribution to the science of drug design. Readers of this blog will recall our uncouth criticism of LELP which, to my eternal shame, I must now admit is actually the elusive, universal metric that achieves simultaneous normalization (and renormalization) of generalized interaction potential with respect to the size-scaled octanol/water partition function.

What changed my view so radically? Previously we observed that LELP treats the ADME risk associated with logP of 1 and 75 heavy atoms as equivalent to that associated with logP of 3 and 25 heavy atoms. Well it turns out that I was using the rest mass of the hydrogen atom to make this comparison which unequivocally invalidates the criticism of what turns out to be the most fundamental (and beautiful) of all the ligand efficiency metrics.

It is not intuitively obvious why relativistic correction is necessary for the correct normalization of affinity with respect to both molecular size and lipophilicity. However, I was fortunate to receive to receive a rare copy of the seminal article in the Carpathian Journal of Thermodynamics by T. T. Macoute, O. B. Ya and B. Samedi. The math is quite formidable and is based on convergence characteristics of the non-linear response to salt concentration of the Soucouyant Tensor, following application of the Douen p-Transform. B. Samedi is actually better known for his even more seminal study (with A. Bouchard) of the implications for cognitive function of the slow off-rate of tetrodotoxin in its dissociation from Duvalier's Tyrosine Kinase (DTK).

So there you have it. Ignore all false metrics and use LELP with confidence in your Sacred Quest for the Grail.  

Thursday, 10 March 2016

Ligand efficiency beyond the rule of 5

One recurring theme in this blog is that the link between physicochemical properties and undesirable behavior of compounds in vivo may not be as strong as property-based design 'experts' would have us believe. To be credible, guidelines for drug discovery need to reflect trends observed in relevant, measured data and the strengths of these trends tells you how much weight you should give to the guidelines. Drug discovery guidelines are often specified in terms of metrics, such as Ligand Efficiency (LE) or property forecast index (PFI), and it is important to be aware that every metric encodes assumptions (although these are rarely articulated).

The most famous set of guidelines for drug discovery is known as the rule of 5 (Ro5) which is essentially a statement of physicochemical property distributions for compounds that had progressed at least as far as as Phase II at some point before the Ro5 article was published in 1997. It is important to remember (some 'experts' have short memories) that Ro5 was originally presented as a set of guidelines for oral absorption. Personally, I have never regarded Ro5 as particularly helpful in practical lead optimization since it provides no guidance as to how suboptimal ADMET characteristics of compliant compounds can be improved. Furthermore, Ro5 is not particularly enlightening with respect to the consequences of straying out the allowed region and into 'die Verbotenezone'.

Nobody reading this blog needs to be reminded that drug discovery is an activity that has been under the cosh for some time and a number of publications ( 1 | 2 | 3 | 4 ) examine potential opportunities outside the chemical space 'enclosed' by Ro5. Given that drug-likeness is not the secure concept that those who claim to be leading our thoughts would have us believe, I do think that we really need to be a bit more open minded in our views as to the regions of chemical space in which we are prepared to work. That said, you cannot afford to perform shaky analysis when proposing that people might consider doing things differently because that will only hand a heavy cudgel to the roundheads for them to beat you with.

The article that I'll be discussing has already been Pipelined and this post has a much narrower focus than Derek's post. The featured study defines three regions of chemical space:  Ro5 (rule of 5), eRo5 (extended rule of 5) and bRo5 (beyond rule of 5). The authors note that "eRo5 space may be thought of as a buffer zone between Ro5 and bRo5 space".  I would challenge this point because there is a region (MW less than 500 Da and ClogP between 5 and 7.5) between Ro5 and bRo5 spaces that is not covered by the eRo5 specifications. As such, it is not meaningful to compare properties of eRo5 compounds with properties of Ro5 or bRo5 compounds. The authors of featured article really do need to fix this problem if they're planning to carve out niche in this area of study because failing to do so will make it easier for conservative drug-likeness 'experts' to challenge their findings. Problems like this are particularly insidious because the activation barriers for fixing them just keep getting higher the longer you ignore them.   

But enough of Bermuda Triangles in the space between Ro5 and bRo5 because the focus of this post is ligand efficiency and specifically its relevance (or otherwise) to bRo5 cmpounds. I'll write a formula for generalized LE is a way that makes it clear that DG° is a function of temperature, pressure and the standard concentration: 

LEgen = -DG°(T,p,C°)/HA

When LE is calculated it is usually assumed that C° is 1 M although there is nothing in the original definition of LE that says this has to be so and few, if any, users of the metric are even aware that they are making the assumption. When analyzing data it is important to be aware of all assumptions that you're making and the effects that making these assumptions may have on the inferences drawn from the analysis.

Sometimes LE is used to specify design guidelines.  For example we might assert that acceptable fragment hits must have LE above a particular cutoff. It's important to remember that setting a cutoff for LE is equivalent to imposing an affinity cutoff that depends on molecular size. I don't see any problem with allowing the affinity cutoff to increase with molecular size (or indeed lipophilicity) although the response of the cutoff to molecular size should reflect analysis of measured data (rather than noisy sermons of self-appointed thought-leaders). When you set a cutoff for LE, you're assuming (whether or not you are aware of it) that the affinity cutoff is a line that intersects the affinity axis at a point corresponding to Kof 1 M. Before heading back to bRo5, I'd like you to consider a question. If you're not comfortable setting an affinity cutoff as a function of molecular size would you be comfortable setting a cutoff for LE?

So let's take a look at what the featured article has to say about affinity: 

"Affinity data were consistent with those previously reported [44] for a large dataset of drugs and drugs in Ro5, eRo5 and bRo5 space had similar means and distributions of affinities (Figure 6a)"

So the article is saying that, on average, bRo5 compounds don't need to be of higher affinity than Ro5 compounds and that's actually useful information. One might hypothesize that unbound concentrations of bRo5 compounds tend to be lower than for Ro5 compounds because the former are less drug-like and precisely the abominations that MAMO (Mothers Against Molecular Obesity) have been trying to warn honest, god-fearing folk about for years. If you look at Figure 6a in the featured article, you'll see that the mean affinity does not differ significantly between the three categories of compound. Regular readers of this blog will be well aware that that categorizing continuous data in this manner tends to exaggerate trends in data. Given that the authors are saying that there isn't a trend, correlation inflation is not an issue here. 

Now look at Figure 6b. The authors note:

"As the drugs in eRo5 and bRo5 space are significantly bigger than Ro5 drugs, i.e., they have higher molecular weights and more heavy atoms, their LE is significantly lower"

If you're thinking about using these results in your own work, you really need to be asking whether or not the results provide any real insight (i.e. something beyond the the trivial result that 1/HA gets smaller when HA gets larger? This would also be a good time to think very carefully about all the assumptions you're going to make in your analysis. The featured article states:

"Ligand efficiency metrics have found widespread use;[45however, they also have some limitations associated with their application, particularly outside traditional Ro5 drug space. [46We nonetheless believe it is useful to characterize the ligand efficiency (LE) and lipophilic ligand efficiency (LLE) distributions observed in eRo5 and bRo5 space to provide guides for those who wish to use them in drug development"

Given that I have asserted that LE is not even wrong and have equated it with homeopathy, I'm not sure that I agree with sweeping LE's probems under the carpet by making a vague reference to "some limitations".  Let's not worry too much about trivial details because declaring a ligand efficiency metric to be useful is a recognized validation tool (even for LELP which appears have jumped straight from the pages of a Mary Shelley novel).  There is a rough analogy with New Math where "the important thing is to understand what you're doing rather to get the right answer" although that analogy shouldn't be taken too far because it's far from clear whether or not LE advocates actually understand what they are doing. As an aside, New Math is what inspired "the rule of 3 is just like the rule of 5 if you're missing two fingers" that I have occasionally used when delivering harangues on fragment screening library design.

So let's see what happens when one tries to set an LE threshold for for bRo5 compounds. The featured article states:

"Instead, the size and flexibility of the ligand and the shape of the target binding site should be taken into account, allowing progression of compounds that may give candidate drugs with ligand efficiencies of ≥0.12 kcal/(mol·HAC), a guideline that captures 90% of current oral drugs and clinical candidates in bRo5 space"

So let's see how this recommended LE threshold of 0.12 kcal/(mol.HA) translates to affinity thresholds for compounds with molecular weights of 700 Da and 3000 Da. I'll assume a temperature of 298 K and C° of 1 M when calculating DG°and will use 14 Da/HA to convert  molecular weight to heavy atoms.  I'll conclude the post by asking you to consider the following two questions? 

  • The recommended LE threshold transforms to pKD threshold of 4.4 at 700 Da. When considering progression of compounds that may give candidate drugs, would you consider a recommendation that KD should be less than 40 mM to be useful?

  • The recommended LE threshold transforms to a pKD threshold of 19 at 3000 Da. How easy do you think it would be to measure a pKD value of 19? When considering progression of compounds that may give candidate drugs, would you consider a recommendation that  pKD be greater than 19 to be useful?