Molecular Design: fragment-to-lead

Showing posts with label fragment-to-lead. Show all posts

Sunday, 2 August 2020

Why fragments?

Paramin panorama

Crystallographic fragment screens have been run recently against the main protease (at Diamond) and the Nsp3 macrodomain (at UCSF and Diamond) of SARS-Cov-2 and I thought that it might be of interest to take a closer look at why we screen fragments. Fragment-based lead discovery (FBLD) actually has origins in both crystallography [V1992 | A1996] and computational chemistry [M1991 | B1992 | E1994]. Measurement of affinity is important in fragment-to-lead work because it allows fragment-based structure-activity relationships to be established prior to structural elaboration. Affinity measurement is typically challenging when fragment binding has been detected using crystallography although affinity can be estimated by observation of the response of occupancy to concentration (the ∆G° value of −3.1 kcal/mol reported for binding of pyrazole to protein kinase B was derived in this manner).

Although fragment-based approaches to lead discovery are widely used, it is less clear why fragment-based lead discovery works as well as it appears to. While it has been stated that “fragment hits form high-quality interactions with the target”, the concept of interaction quality is not sufficiently well-defined to be useful in design. I ran a poll which asked about the strongest rationale for screening fragments. The 65 votes were distributed as follows: ‘high ligand efficiency’ (23.1%), ‘enthalpy-driven binding’ (16.9%), ‘low molecular complexity’ (26.2%) and ‘God loves fragments’ (33.8%). I did not vote.

The belief is that fragments are especially ligand-efficient has many adherents in the drug discovery field and it has been asserted that “fragment hits typically possess high ‘ligand efficiency’ (binding affinity per heavy atom) and so are highly suitable for optimization into clinical candidates with good drug-like properties”. The fundamental problem with ligand efficiency (LE), as conventionally calculated, is that perception of efficiency varies with the arbitrary concentration unit in which affinity is expressed (have you ever wondered why Kd , Ki or IC50 has to be expressed in mole/litre for calculation of LE?). This would appear to be an rather undesirable characteristic for a design metric and LE evangelists might consider trying to explain why it’s not a problem rather than dismissing it as a “limitation” of the metric or trying to shift the burden of proof is onto the skeptics to show that the evangelists’ choice of concentration unit for calculation of LE is not useful.

The problems associated with the arbitrary nature of the concentration unit used to express affinity were first identified in 2009 and further discussed in 2014 and 2019. Specifically, it was noted that LE has a nontrivial dependency on the concentration, C°, used to define the standard state. If you want to do solution thermodynamics with concentrations defined then you do need to specify a standard concentration. However, it is important to remember that the choice of standard concentration is necessarily arbitrary if the thermodynamic analysis is to be valid. If your conclusions change when you use a different definition of the standard state then you’ll no longer be doing thermodynamics and, as Pauli might have observed, you’ll not even be wrong. You probably don't know it, but when you use the LE metric, you’re making the sweeping assumption that all values of Kd, Ki and IC50 tend to a value of 1 M in the limit of zero molecular size. Recalling the conventional criticism of homeopathy, is there really a difference between a solute that is infinitely small and a solute that is infinitely dilute?

I think that’s enough flogging of inanimate equines for one blog post so let’s take a look at enthalpy-driven binding. My view of thermodynamic signature characterization in drug discovery is that it’s, in essence, a solution that’s desperately seeking a problem. In particular, there does not appear to be any physical basis for claims that the thermodynamic signature is a measure of interaction quality. In case you’re thinking that I’m an unrepentant Luddite, I will concede that thermodynamic signatures could prove useful for validating physics-based models of molecular recognition and in, in specific cases, they may point to differences in binding mode within congeneric series. I should also stress that the modern isothermal calorimeter is an engineering marvel and I'd always want this option for label-free, affinity measurement in any project.

It is common to see statements in the thermodynamic signature literature to the effect that binding is ‘enthalpy-driven’ or ‘entropy-driven’ although it was noted in 2009 (coincidentally, in the same article that highlighted the nontrivial dependence of LE on C°) that these terms are not particularly meaningful. The problems start when you make comparisons between the numerical values of ∆H (which is independent of C°) and T∆S° (which depends on C°). If I’d presented such a comparison in physics class at high school (I was taught by the Holy Ghost Fathers in Port of Spain), I would have been caned with a ferocity reserved for those who’d dozed off in catechism class. I’ll point you toward an article which asserts that, “when compared with many traditional druglike compounds, fragments bind more enthalpically to their protein targets”. I have a number of issues with this article although this is not the place for a comprehensive review (although I’ll probably pick it up in ‘The Nature of Lipophilic Efficiency’ when that gets written).

While I don’t believe that the authors have actually demonstrated that fragments bind more enthalpically than ligands of greater molecular size, I wouldn’t be surprised to discover that gains in affinity over the course of a fragment-to-lead (F2L) campaign had come more from entropy than enthalpy. First, the lost translation entropy (the component of ∆S° that endows it with its dependence on C°) is shared over greater number of intermolecular contacts for structurally-elaborated compounds and this article is relevant to the discussion. Second, I’d expect the entropy of any water molecule to increase when it is moved to bulk solvent from contact with molecular surface of ligand or target (regardless of polarity of the molecular surface at the point of contact). Nevertheless, this is something that you can test easily by examining the response of (∆H + T∆S°) to ∆G° (best to not to aggregate data for different targets and/or temperatures when analyzing isothermal titration calorimetry data in this manner). But even if F2L affinity gains were shown generally to come more from entropy than enthalpy, would that be a strong rationale for screening fragments?

This gets us onto molecular complexity and this article by Mike Hann and GSK colleagues should be considered essential reading for anybody thinking about selecting of compounds for screening. The Hann model is a conceptual framework for molecular complexity but it doesn’t provide much practical guidance as to how to measure complexity (this is not a criticism since the thought process should be more about frameworks and less about metrics). I don’t believe that it will prove possible to quantify molecular complexity in an objective manner that is useful for designing compound libraries (I will be delighted to be proven wrong on this point). The approach to handling molecular complexity that I’ve used in screening library design is to restrict extent of substitution (and other substructural features that can be considered to be associated with molecular complexity) and this is closer to ‘needle screening’ as described by Roche scientists in 2000 than to the Hann model.

Had I voted in the poll, ‘low molecular complexity’ would have got my vote. Here’s what I said in NoLE (it’s got an entire section on fragment-based design and a practical suggestion for redefining ligand efficiency so that perception does not change with C°):

"I would argue that the rationale for screening fragments against targets of interest is actually based on two conjectures. First, chemical space can be covered most effectively by fragments because compounds of low molecular complexity [18, 21, 22] allow TIP [target interaction potential] to be explored [70,71,72,73,74] more efficiently and accurately. Second, a fragment that has been observed to bind to a target may be a better starting point for design than a higher affinity ligand whose greater molecular complexity prevents it from presenting molecular recognition elements to the target in an optimal manner."

To be fair, those who advocate the use of LE and thermodynamic signatures in fragment-based design do not deny the importance of molecular complexity. Let’s assume for the sake of argument that interaction quality can actually be defined and is quantified by the LE value and/or the thermodynamic signature for binding of compound to target. While these are massive assumptions, LE values and thermodynamic signatures are still effects rather than causes.

The last option for poll was ‘God loves fragments’ and more respondents (33.8%) voted for this than any of the first three options. I would interpret a vote for ‘God loves fragments’ in three ways. First, the respondent doesn’t consider any one of the first three options to be a stronger rationale for screening fragments than the other two. Second, the respondent doesn’t consider any of the first three options to be a valid rationale for screening fragments. Third, the respondent considers fragment-based approaches to have been over-sold.

This is a good place to wrap up. While I remain an enthusiast for fragment-based approaches to lead discovery, I do also believe that they have been somewhat oversold. The sensitivity of LE evangelists to criticism of their metric may stem from the use of LE to sell fragment-based methods to venture capitalists and, internally, to skeptical management. A shared (and serious) deficiency in the conventional ways in which LE and thermodynamic signature are quantified is that perception changes when the arbitrary concentration, C°, that defines the standard state is changed. While there are ways in which this deficiency can be addressed for analysis, it is important that the deficiency be acknowledged if we are to move forward. Drug design is difficult and if we, as drug designers, embrace shaky science and flawed data analysis then those who fund our activities may conclude that the difficulties that we face are of our own making.

Saturday, 18 July 2020

SARS-CoV-2 main protease. Crowdsourcing, peptidomimetics and fragments

<< previous || next >>

“Just take the ball and throw it where you want to. Throw strikes. Home plate don’t move.”

Satchel Paige (1906-1982)

The COVID Moonshot and OSC19 are examples of what are sometimes called crowdsourced or open source approaches to drug discovery. While I’m not particularly keen on the use of the term ‘open source’ in this context, I have absolutely no quibble with the goal of seeking cures and treatments for diseases that are ignored by commercial drug discovery organizations. Open source drug discovery originated with OSDD in India and it should be noted that the approach has also been pioneered for malaria by OSM. I see crowdsourcing primarily as a different way to organize and resource drug discovery rather than as a radically different way to do drug discovery.

One point that’s not always appreciated by cheminformaticians, computational chemists and drug discovery scientists in academia is that there’s a bit more to drug discovery than making predictions. In particular, I advise those seeking to transform drug discovery to ensure that they actually know what a drug needs to do and understand the constraints under which drug discovery scientists work. Currently, it does not appear to be possible to predict the effects of compounds in live humans from molecular structure with the accuracy needed for prediction-driven design and this is the primary reason that drug discovery is incremental in nature. A big part of drug discovery is generation of the information needed in order to maintain progress and there are gains to be had by doing this as efficiently as possible. Efficient generation of information, in turn, requires a degree of coordination that may prove difficult to achieve in a crowdsourced project.

The SARS-CoV-2 main protease (Mpro) is one of a number of potential targets of interest in the search for COVID-19 therapies. Like the cathepsins that are (or, at least, have been) of interest to the pharma/biotech industry as potential targets for therapeutic intervention, Mpro is a cysteine protease. If I’d been charged with quickly delivering an inhibitor of Mpro as a candidate drug then I’d be taking a very close look at how the pharma/biotech industry has pursued cysteine protease targets. Balacatib, odanacatib (cathepsin K inhibitors) and petesicatib (cathepsin S inhibitor) can each be described as a peptidomimetic with a warhead (nitrile) that forms a covalent bond reversibly with the catalytic cysteine.

A number of peptidomimetic Mpro inhibitors have been described in the literature and this blog post by Chris Southan may be of interest. I’ve been looking at the published inhibitors shown below in Chart 1 (which exhibit antiviral activity and have been subjected to pharmacokinetic and toxicological evaluation) and have written some notes on mapping the structure-activity relationship for compounds like these. I should stress that compounds discussed in these notes are not expected to be dramatically more potent than the two shown in Chart 1 (in fact, I expect at least one to be significantly less potent). Nevertheless, I would argue that assay results for these proposed synthetic targets would inform design.

My assessment of these compounds is that there is significant room for improvement and I think that it would be relatively easy to achieve a pIC50 of 8 (corresponding to an IC50 of 10 nM) using the aldehyde warhead. I’d consider taking an aldehyde forward (there are options for dosing as a prodrug) although it really would be much better if there was also the option to exchange this warhead for the nitrile (a warhead that is much-loved by industrial medicinal chemists since it’s rugged, polar and contributes minimally to molecular size). While I’d anticipate that replacement of aldehyde with nitrile will lead to a reduction in potency, it’s necessary to quantify the potency loss to enable the potential of nitriles to be properly assessed. The binding mode observed for 1 is shown below in Figure 1 and it’s likely that the groove region will need to be more fully exploited (this article will give you an idea of the sort of thing I have in mind) in order to achieve acceptable potency if the aldehyde warhead is replaced by nitrile.

The COVID Moonshot project currently appears to be in what many industrial drug discovery scientists would call the hit-to-lead phase. In my view the principal objective of hit-to-lead work is to create options since having options will give the lead optimization team room to manoeuvre (you can think of hit-to-lead work as being a bit like playing in midfield). The COVID Moonshot project is currently focused on exploitation of hits from a fragment screen against MPro and, while I’d question whether this approach is likely to get to a candidate drug more quickly than the conventional structure-based design used in industry to pursue cathepsins, it’s certainly an interesting project that I’m happy to contribute to. It’s also worth mentioning that fragment screens have been run against SARS-CoV-2 Nsp3 macrodomain at UCSF and Diamond since there are no known inhibitors for this target.

Here’s a blog post by Pat Walters in which he examines the structure-activity relationships emerging for the fragment-derived inhibitors. Specifically, he uses a metric known as the Structure-Activity Landscape Index (SALI) to quantify the sensitivity of activity to structural changes. Medicinal chemists apply the term ‘activity cliff’ to situations where a small change in structure results in a large change in activity and I’ve argued that the idea of quantifying the sensitivity of a physicochemical effect to structural modifications goes all the way back to Hammett. One point that comes out of Pat’s post is that it’s difficult to establish structure-activity relationships for low affinity ligands with a conventional biochemical assay. When applying fragment-based approaches in lead discovery, there are distinct advantages to being able to measure low binding affinity (~ 1 mM) since this allows fragment-based structure-activity relationships to be explored prior to synthetic elaboration of fragment hits. As Pat notes, inadequate solubility in assay buffer clearly places limits on the affinity that can be reliably measured in any assay although interference with the readout of a biochemical assay can also lead to misleading results. This is one reason that biophysical detection of binding using methods such as surface plasmon resonance (SPR) are favored in fragment-based lead discovery. Here’s an article by some of my former colleagues which shows how you can assess the impact of interference with the readout of a biochemical assay (and even correct for it if the effect isn’t too great).

My first contribution to the COVID Moonshot project is illustrated in Chart 2 and the fragment-derived inhibitor 3 from which I started is also featured in Pat’s post. From inspection of the crystal structure, I noticed that the catalytic cysteine might be targeted by linking a ‘reversible’ warhead from the amide nitrogen (4 and 5). Although this might look fine on paper, the experimental data in this article suggest that linking any saturated carbon to the amide nitrogen will bias the preferred amide geometry away from trans to cis. Provided that the intrinsic gain in affinity resulting from linking the warhead is greater than the cost of adopting the bound conformation, the structural modification will lead to a net increase in affinity and the structures could be locked (here's an article that shows how this can work) into the bound conformation (e.g. by forming a ring).

In addition to being accessible to a warhead linked from the amide nitrogen of 3, the catalytic cysteine is also within striking distance of the carbonyl carbon and it would be prudent to consider the possibility that 3 and its analogs can function as substrates for Mpro. There is precedent for this type of behavior and I’ll point you toward an article that notes that a series of esters identified as cruzain inhibitors can function as substrates and more recent article that presents cruzain inhibitors that I’d consider to be potential substrates. A crystal structure of the protein-ligand complex is potentially misleading in this context since the enzyme might not be catalytically active. I believe that 6 could be used to explore this possibility since the carbonyl carbon would be expected to be more electrophilic and 3-hydroxy, 4-methylpyridine would be expected to be a better leaving group than its 3-amino analog.

This is a good point to wrap things up. I think that Satchel Paige gave us some pretty good advice on how to approach drug discovery and that's yet another reason that Black Lives Matter.

Friday, 30 November 2018

Ligand efficiency and fragment-to-lead optimizations

The third annual survey (F2L2017) of fragment-to-lead (F2L) optimizations was published last week. Given that it was the second survey (F2L2016) in this series, that prompted me to write 'The Nature of Ligand Efficiency' (NoLE), I thought that some comments would be in order. F2L2017 presents analysis of data that had been aggregated from all three surveys and I'll be focusing on the aspects of this analysis that relate to ligand efficiency (LE).

As noted in NoLE, perception of efficiency changes when affinity is expressed in different concentration units and I have argued that this is an undesirable feature for a quantity that is widely touted as useful for design. At very least, it does place a burden of proof on those who advocate the use of LE in design to either show that the change in perception of efficiency with concentration unit is not a problem or to justify their choice of the 1 M concentration unit. One difficulty that LE advocates face is that the nontrivial dependency of LE on the concentration unit only came to light a few years after LE was introduced as "a useful metric for lead selection" and, even now, some LE advocates appear to be in a state of denial. Put more bluntly, you weren't even aware that you were choosing the 1 M concentration unit when you started telling medicinal chemists that they should be using LE to do their jobs but you still want us to believe that you made the correct choice?

I'm assuming that the authors of F2L2017 would all claim familiarity with the fundamentals of physical chemistry and biophysics while some of the authors may even consider themselves to be experts in these areas. I'll put the following question to each of the authors of F2L2017: what would your reaction be to analysis showing that the space group for a crystal structure changed if the unit cell parameters were expressed using different units? I can also put things a bit more coarsely by noting that to examine the effect on perception of changing a unit is, when applicable, a most efficacious bullshit detector.

The analysis in F2L2017 that I'll focus on is the the comparison between fragment hits and leads. As I showed in NoLE, it is meaningless to compare LE values because LE has a nontrivial dependency on the concentration unit used to express affinity. LE advocates can of course declare themselves to be Experts (or even Thought Leaders) and invoke morality in support of their choice of the 1 M concentration unit. However, this is a risky tactic because physical science can't accommodate 'privileged' units and an insistence that quantities have to be expressed in specific units might be taken as evidence that one is not actually an Expert (at least not in physical science).

So let's take a look at what F2L2017 has to say about LE in the context of F2L optimizations.

"The distributions for fragment and lead LE have also remained reasonably constant. On average there is no significant change in LE between fragment and lead (ΔLE = 0.004, p ≈ 0.8). Figure 5A shows the distribution of ΔLE, which is approximately centered around zero, although interestingly there are more examples where LE increases from fragment to lead (40) than where a decrease is seen (25). Some caution is warranted when interpreting these data, as our minimum criterion for 100-fold potency improvement may have introduced some selection bias. Nevertheless, there is no clear evidence in this data set that LE changes systematically during fragment optimization. Although the average change in LE from fragment to lead is small, Figure 5B shows that the correlation between fragment and lead LE is modest (R2 = 0.22), with a mean absolute difference between fragment and lead LE of 0.08."

This might be a good point at which to remind the authors of F2L2017 about some of the more extravagant claims that have been made for LE. It has been asserted that “fragment hits typically possess high ‘ligand efficiency’ (binding affinity per heavy atom) and so are highly suitable for optimization into clinical candidates with good drug-like properties”. It has also been claimed that "ligand efficiency validated fragment-based design". However, the more important point is that it is completely meaningless to compare values of LE of hits and leads because you will come to different conclusions if you express affinity using a different concentration unit (see Table 2 in NoLE). It is also worth noting that expressing affinity in units of 1 M introduces selection bias just as does the requirement for 100-fold potency improvement.

Had I been reviewing F2L2017, I'd have suggested that the authors might think a bit more carefully about exactly why they are analyzing differences between LE values for fragments and leads. A perspective on fragment library design (reviewed in this post) correctly stated that a general objective of optimization projects is “ensuring that any additional molecular weight and lipophilicity also produces an acceptable increase in affinity". If you're thinking along these lines then scaling the F2L potency increase by the corresponding increase in molecular size makes a lot more sense than comparing LE for the fragments and leads. This quantifies how efficiently (in terms of increased molecular size) the potency gains for the F2L project have been achieved. This is not a new idea and I'll direct readers toward a 2006 study in which it was noted that a tenfold increase in affinity corresponded to a mean increase in molecular weight of 64 Da (standard deviation = 18 Da) for 73 compound pairs from FBLD projects. This is how group efficiency (GE) works and I draw the attention of the two F2L2017 authors from Astex to a perceptive statement made by their colleagues that GE is “a more sensitive metric to define the quality of an added group than a comparison of the LE of the parent and newly formed compounds”.

The distinction between a difference in LE and a difference in affinity that has been scaled by a difference in molecular size becomes a whole lot clearer if you examine the relevant equations. Equation (1) defines the F2L LE difference and first thing that you'll notice is that is that it is algebraically more complex than equation (2). This is relevant because LE advocates often tout the simplicity of the LE metric. However, the more significant difference between the two is that the concentration that defines the standard state is present in equation (1) but absent in equation (2). This means that you get the same answer when you scale affinity difference by the corresponding molecular size difference regardless of the units in which you express affinity.

So let's see how things look if you're prepared to think beyond LE when assessing F2L optimizations. Here's a figure from NoLE in which I've plotted the change in affinity against the change in number of non-hydrogen atoms for the F2L optimizations surveyed in F2L2016. The molecular size efficiency for each optimization can be calculated by dividing the change in affinity by the change in in number of non-hydrogen atoms. I've drawn lines corresponding to minimum and maximum values of molecular size efficiency and have also shown the quartiles.

So now it's time to wrap things up. A physical quantity that is expressed in a different unit is still the same physical quantity and I presume that all the authors of F2L2017 would have been aware of this while they were still undergraduates. LE was described as thermodynamically indefensible in comments on Derek's post on NoLE and choosing to defend an indefensible position usually ends in tears (just as it did for the French at Dien Bien Phu in 1954). The dilemma facing those who seek to lead opinion in FBDD is that to embrace the view that the 1 M concentration unit is somehow privileged requires that they abandon fundamental physicochemical principles that they would have learned as undergraduates.