Sunday, 9 August 2020

How not to repurpose a 'drug'



I sometimes wonder what percentage of the pharmacopoeia will have been proposed for repurposing for the treatment of COVID19 by the end of 2020. In particular, I worry about the long-term, psychological effects on bloggers such as Derek who is forced to play whack-a-mole with hydroxychloroquine repurposing studies. Those attempting to use text mining and machine learning to prioritize drugs for repurposing should take note of the views expressed in this tweet

The idea behind drug repurposing is very simple. If an existing drug looks like it might show therapeutic benefits in the disease that you’re trying to treat then you can go directly to assessing efficacy in humans without having to do any of those irksome Phase I studies. However, you need to be aware that the approval of a drug always places restrictions on the dose that you can use and route of administration (for example, you can't administer a drug intravenously if it has only been approved for oral adminstration). One rationale for drug repurposing is that the target(s) for the drug may also have a role in the disease that you’re trying to treat. Even if the target is not directly relevant to the disease, the drug may engage a related target that is relevant with sufficient potency to have a therapeutically exploitable effect. While these rationales are clear, I do get the impression that some who use text-mining and machine learning to prioritize drugs for repurposing may simply be expecting the selected drugs to overwhelm targets with druglikeness. 

There are three general approaches to directly tackle a virus such as SARS-CoV-2 with a small molecule drug (or chemical agent). First, destroy the virus before it even sees a host cell and this is the objective of hand-washing and disinfection of surfaces. Second, prevent the virus from infecting host cells, for example, by blocking the interaction between the spike protein and ACE2. Third, prevent the virus from functioning in infected cells, for example, by inhibiting the SARS-CoV-2 main protease. One can also try to mitigate the effects of viral infection, for example, by using anti-inflammatory drugs to counter cytokine storm although I’d not regard this as tackling the virus directly.

In this post, I’ll be reviewing an article which suggests that quaternary ammonium compounds could be repurposed for treatment of COVID-19. The study received NIH funding and this may be of interest to researchers who failed to secure NIH funding. The article was received on 06-May-2020, accepted on 18-May-2020 and published on 25-May-2020. One of the authors of the article is a member of the editorial advisory board of the journal. As of 08-Aug-2020, two of the authors are described as cheminformatics experts in their Wikipedia biographies and one is also described as an expert in computational toxicology. 

The authors state: “This analysis identified ammonium chloride, which is commonly used as a treatment option for severe cases of metabolicalkalosis, as a drug of interest. Ammonium chloride is a quaternary ammonium compound that is known to also have antiviral activity (13,14) against coronavirus (Supplementary Material) and has a mechanism of action such as raising the endocytic and lysosomal pH, which it shares with chloroquine (15). Review of the text-mined literature also indicated a high-frequency of quaternary ammonium disinfectants as treatments for many viruses (Supplementary Material) (16,17), including coronaviruses: these act by deactivating the protective lipid coating that enveloped viruses like SARS-CoV-2 rely on.” 

Had I described ammonium chloride as a “quaternary ammonium compound” at high school in Trinidad (I was taught by the Holy Ghost Fathers), I’d have received a correctional package of licks and penance. For cheminformatics ‘experts’ to make such an error should remind us that each and every expert has an applicability domain and a shelf life. However, the errors are not confined to nomenclature since the cationic nitrogen atoms of a quaternary ammonium compound and a protonated amine are very different beasts. While a protonated amine can deprotonate in order to cross a lipid bilayer, the positive charge of a quaternary ammonium compound can be described as ‘permanent’ and this has profound consequences for its physicochemical behavior. First, the protonation state of a quaternary ammonium nitrogen does not change in response to a change in pH. This means that, unlike amines, quaternary ammonium compounds are not drawn into lysosomes and other acidic compartments. Second, the positive charge needs to be balanced by an anion (in some cases, this may be in the same covalent framework as the quaternary ammonium nitrogen). Despite being positively charged, the quaternary ammonium group is not as polar as you might think because it can’t donate hydrogen bonds to water. However, to get out of water it needs to take its counterion (which is typically polar) with it. I like to think about quaternary ammonium compounds (and other permanent cations) as hydrophobic blobs that are held in solution by the solvation of their counterions. A typical quaternary ammonium compound can also be considered as a detergent in which the polar and non-polar parts are not covalently bonded to each other. 

My view is that the antiviral ‘activity’ reported for ammonium chloride and chloroquine is a red herring when considering potential antiviral activity of quaternary ammonium compounds because neither has a quaternary ammonium center in its molecular structure. Nevertheless, I consider “raising the endocytic and lysosomal pH” to be an unconvincing ‘explanation’ for the antiviral ‘activity’ of ammonium chloride and chloroquine since one would anticipate analogous effects for any base of comparable pKa. One should also anticipate considerable collateral damage to result from raising the endocytic and lysosomal pH (assuming that the ‘drug’ is able overwhelm the buffering systems that have evolved to maintain physiological pH in live humans). The pH raising ‘explanation’ for antiviral ‘activity’ reminded me of suggestions that cancer can be cured by drinking aqueous sodium bicarbonate and I’ll direct readers to this relevant post by Derek. 

This brings us to cetylpyridinium chloride and miramistin shown below and I’ve included the structure of paraquat in the graphic. While miramistin does indeed have a quaternary ammonium nitrogen in its molecular structure, cetylpyridinium chloride is not a quaternary ammonium compound (the cationic nitrogen is only connected to three atoms) and would be more correctly referred to as an N-alkylpyridinium compound (or salt). Nevertheless, this is a less serious error than describing ammonium chloride as a quaternary ammonium compound because cetylpyridinium is, at least, a permanent cation. Neither cetylpyridinium chloride nor miramistin are quite as clean as the authors might have you believe (take a look at L1991 | L1996 | D2017 | K2020 | P2020). I’d expect an N-alkylpyridinium cation to be more electrophilic than a tetraalkylammonium cation and paraquat, with two N-alkylpyridinium substructures is highly toxic. Would Lady Bracknell's toxicity assessment have been that one N-alkylpyridinium may be regarded as a misfortune while two looks like carelessness?
I have no problem with hypothesizing that a chemical agent, such as cetypyridinium chloride, which destroys SARS-CoV-2 on surfaces could do the same thing safely when sprayed up your nose, into your mouth or down your throat. If tackling the virus in this manner, you do need to be thinking about the effects of the chemical agent on the mucus (which is believed to protect against viral infection). The authors assert that cetylpyridinium chloride “has been used in multiple clinical trials” although they only cite this study in which it was used in conjunction with glycerin and xanthan gum (claimed by the authors of the clinical study to “form a barrier on the host mucosa, thus preventing viral contact and invasion”).

The main challenge to a proposal that cetylpyridinium chloride be repurposed for treatment of COVID-19 is that the compound does not appear to have actually been conventionally approved (i.e. shown to be efficacious and safe) as a drug for dosing as a nasal spray, mouth wash or gargle. Another difficulty is that cetylpyridinium chloride does not appear to have a specific molecular target. Something that should worry readers of the article is that the authors make no reference to literature in which potential toxicity of cetylpyridinium chloride and quaternary ammonium compounds is discussed.

This is a good place to wrap up and, here in Trinidad's Maraval Valley, I'm working a cure for COVID-19. I anticipate a phone call from Stockholm later in the year.


Sunday, 2 August 2020

Why fragments?


Paramin panorama

Crystallographic fragment screens have been run recently against the main protease (at Diamond) and the Nsp3 macrodomain (at UCSF and Diamond) of SARs-Cov-2 and I thought that it might be of interest to take a closer look at why we screen fragments. Fragment-based lead discovery (FBLD) actually has origins in both crystallography [V1992 | A1996] and computational chemistry [M1991 | B1992 | E1994]. Measurement of affinity is important in fragment-to-lead work because it allows fragment-based structure-activity relationships to be established prior to structural elaboration. Affinity measurement is typically challenging when fragment binding has been detected using crystallography although affinity can be estimated by observation of the response of occupancy to concentration (the ∆G° value of −3.1 kcal/mol reported for binding of pyrazole to protein kinase B was derived in this manner).

Although fragment-based approaches to lead discovery are widely used, it is less clear why fragment-based lead discovery works as well as it appears to. While it has been stated that “fragment hits form high-quality interactions with the target”, the concept of interaction quality is not sufficiently well-defined to be useful in design. I ran a poll which asked about the strongest rationale for screening fragments.  The 65 votes were distributed as follows: ‘high ligand efficiency’ (23.1%), ‘enthalpy-driven binding’ (16.9%), ‘low molecular complexity’ (26.2%) and ‘God loves fragments’ (33.8%). I did not vote.

The belief is that fragments are especially ligand-efficient has many adherents in the drug discovery field and it has been asserted that “fragment hits typically possess high ‘ligand efficiency’ (binding affinity per heavy atom) and so are highly suitable for optimization into clinical candidates with good drug-like properties”. The fundamental problem with ligand efficiency (LE), as conventionally calculated, is that perception of efficiency varies with the arbitrary concentration unit in which affinity is expressed (have you ever wondered why Kd , Ki or IC50 has to be expressed in mole/litre for calculation of LE?). This would appear to be an rather undesirable characteristic for a design metric and LE evangelists might consider trying to explain why it’s not a problem rather than dismissing it as a “limitation” of the metric or trying to shift the burden of proof is onto the skeptics to show that the evangelists’ choice of concentration unit for calculation of LE is not useful.

The problems associated with the arbitrary nature of the concentration unit used to express affinity were first identified in 2009 and further discussed in 2014 and 2019. Specifically, it was noted that LE has a nontrivial dependency on the concentration,  C°, used to define the standard state. If you want to do solution thermodynamics with concentrations defined then you do need to specify a standard concentration. However, it is important to remember that the choice of standard concentration is necessarily arbitrary if the thermodynamic analysis is to be valid. If your conclusions change when you use a different definition of the standard state then you’ll no longer be doing thermodynamics and, as Pauli might have observed, you’ll not even be wrong. You probably don't know it, but when you use the LE metric, you’re making the sweeping assumption that all values of Kd, Ki and IC50 tend to a value of 1 M in the limit of zero molecular size. Recalling the conventional criticism of homeopathy, is there really a difference between a solute that is infinitely small and a solute that is infinitely dilute?

I think that’s enough flogging of inanimate equines for one blog post so let’s take a look at enthalpy-driven binding. My view of thermodynamic signature characterization in drug discovery is that it’s, in essence, a solution that’s desperately seeking a problem. In particular, there does not appear to be any physical basis for claims that the thermodynamic signature is a measure of interaction quality.  In case you’re thinking that I’m an unrepentant Luddite, I will concede that thermodynamic signatures could prove useful for validating physics-based models of molecular recognition and in, in specific cases, they may point to differences in binding mode within congeneric series. I should also stress that the modern isothermal calorimeter is an engineering marvel and I'd always want this option for label-free, affinity measurement in any project.

It is common to see statements in the thermodynamic signature literature to the effect that binding is ‘enthalpy-driven’ or ‘entropy-driven’ although it was noted in 2009 (coincidentally, in the same article that highlighted the nontrivial dependence of LE on C°) that these terms are not particularly meaningful. The problems start when you make comparisons between the numerical values of ∆H (which is independent of C°) and T∆S° (which depends on C°). If I’d presented such a comparison in physics class at high school (I was taught by the Holy Ghost Fathers in Port of Spain), I would have been caned with a ferocity reserved for those who’d dozed off in catechism class.  I’ll point you toward an article which asserts that, “when compared with many traditional druglike compounds, fragments bind more enthalpically to their protein targets”. I have a number of issues with this article although this is not the place for a comprehensive review (although I’ll probably pick it up in ‘The Nature of Lipophilic Efficiency’ when that gets written).

While I don’t believe that the authors have actually demonstrated that fragments bind more enthalpically than ligands of greater molecular size, I wouldn’t be surprised to discover that gains in affinity over the course of a fragment-to-lead (F2L) campaign had come more from entropy than enthalpy. First, the lost translation entropy (the component of ∆S° that endows it with its dependence on C°) is shared over greater number of intermolecular contacts for structurally-elaborated compounds and this article is relevant to the discussion. Second, I’d expect the entropy of any water molecule to increase when it is moved to bulk solvent from contact with molecular surface of ligand or target (regardless of polarity of the molecular surface at the point of contact). Nevertheless, this is something that you can test easily by examining the response of (∆H + T∆S°) to ∆G° (best to not to aggregate data for different targets and/or temperatures when analyzing isothermal titration calorimetry data in this manner). But even if F2L affinity gains were shown generally to come more from entropy than enthalpy, would that be a strong rationale for screening fragments?

This gets us onto molecular complexity and this article by Mike Hann and GSK colleagues should be considered essential reading for anybody thinking about selecting of compounds for screening. The Hann model is a conceptual framework for molecular complexity but it doesn’t provide much practical guidance as to how to measure complexity (this is not a criticism since the thought process should be more about frameworks and less about metrics). I don’t believe that it will prove possible to quantify molecular complexity in an objective manner that is useful for designing compound libraries (I will be delighted to be proven wrong on this point). The approach to handling molecular complexity that I’ve used in screening library design is to restrict extent of substitution (and other substructural features that can be considered to be associated with molecular complexity) and this is closer to ‘needle screening’ as described by Roche scientists in 2000 than to the Hann model.

Had I voted in the poll, ‘low molecular complexity’ would have got my vote.  Here’s what I said in NoLE (it’s got an entire section on fragment-based design and a practical suggestion for redefining ligand efficiency so that perception does not change with C°):

"I would argue that the rationale for screening fragments against targets of interest is actually based on two conjectures. First, chemical space can be covered most effectively by fragments because compounds of low molecular complexity [18, 21, 22] allow TIP [target interaction potential] to be explored [70,71,72,73,74] more efficiently and accurately. Second, a fragment that has been observed to bind to a target may be a better starting point for design than a higher affinity ligand whose greater molecular complexity prevents it from presenting molecular recognition elements to the target in an optimal manner."

To be fair, those who advocate the use of LE and thermodynamic signatures in fragment-based design do not deny the importance of molecular complexity. Let’s assume for the sake of argument that interaction quality can actually be defined and is quantified by the LE value and/or the thermodynamic signature for binding of compound to target. While these are massive assumptions, LE values and thermodynamic signatures are still effects rather than causes.

The last option for poll was ‘God loves fragments’ and more respondents (33.8%) voted for this than any of the first three options. I would interpret a vote for ‘God loves fragments’ in three ways. First, the respondent doesn’t consider any one of the first three options to be a stronger rationale for screening fragments than the other two. Second, the respondent doesn’t consider any of the first three options to be a valid rationale for screening fragments. Third, the respondent considers fragment-based approaches to have been over-sold.

This is a good place to wrap up. While I remain an enthusiast for fragment-based approaches to lead discovery, I do also believe that they have been somewhat oversold. The sensitivity of LE evangelists to criticism of their metric may stem from the use of LE to sell fragment-based methods to venture capitalists and, internally, to skeptical management. A shared (and serious) deficiency in the conventional ways in which LE and thermodynamic signature are quantified is that perception changes when the arbitrary concentration,  C°, that defines the standard state is changed. While there are ways in which this deficiency can be addressed for analysis, it is important that the deficiency be acknowledged if we are to move forward. Drug design is difficult and if we, as drug designers, embrace shaky science and flawed data analysis then those who fund our activities may conclude that the difficulties that we face are of our own making.     

Saturday, 18 July 2020

SARS-CoV-2 main protease. Crowdsourcing, peptidomimetics and fragments

<< previous || next >>

“Just take the ball and throw it where you want to. Throw strikes. Home plate don’t move.”

Satchel Paige (1906-1982) 

The COVID Moonshot and OSC19 are examples of what are sometimes called crowdsourced or open source approaches to drug discovery. While I’m not particularly keen on the use of the term ‘open source’ in this context, I have absolutely no quibble with the goal of seeking cures and treatments for diseases that are ignored by commercial drug discovery organizations. Open source drug discovery originated with OSDD in India and it should be noted that the approach has also been pioneered for malaria by OSM.  I see crowdsourcing primarily as a different way to organize and resource drug discovery rather than as a radically different way to do drug discovery.

One point that’s not always appreciated by cheminformaticians, computational chemists and drug discovery scientists in academia is that there’s a bit more to drug discovery than making predictions. In particular, I advise those seeking to transform drug discovery to ensure that they actually know what a drug needs to do and understand the constraints under which drug discovery scientists work. Currently, it does not appear to be possible to predict the effects of compounds in live humans from molecular structure with the accuracy needed for prediction-driven design and this is the primary reason that drug discovery is incremental in nature. A big part of drug discovery is generation of the information needed in order to maintain progress and there are gains to be had by doing this as efficiently as possible. Efficient generation of information, in turn, requires a degree of coordination that may prove difficult to achieve in a crowdsourced project.

The SARS-CoV-2 main protease (Mpro) is one of a number of potential targets of interest in the search for COVID-19 therapies. Like the cathepsins that are (or, at least, have been) of interest to the pharma/biotech industry as potential targets for therapeutic intervention, Mpro is a cysteine protease. If I’d been charged with quickly delivering an inhibitor of Mpro as a candidate drug then I’d be taking a very close look at how the pharma/biotech industry has pursued cysteine protease targets. Balacatib, odanacatib (cathepsin K inhibitors) and petesicatib (cathepsin S inhibitor) can each be described as a peptidomimetic with a warhead (nitrile) that forms a covalent bond reversibly with the catalytic cysteine.

A number of peptidomimetic Mpro inhibitors have been described in the literature and this blog post by Chris Southan may be of interest. I’ve been looking at the published inhibitors shown below in Chart 1 (which exhibit antiviral activity and have been subjected to pharmacokinetic and toxicological evaluation) and have written some notes on mapping the structure-activity relationship for compounds like these. I should stress that compounds discussed in these notes are not expected to be dramatically more potent than the two shown in Chart 1 (in fact, I expect at least one to be significantly less potent). Nevertheless, I would argue that assay results for these proposed synthetic targets would inform design.

My assessment of these compounds is that there is significant room for improvement and I think that it would be relatively easy to achieve a pIC50 of 8 (corresponding to an IC50 of 10 nM) using the aldehyde warhead. I’d consider taking an aldehyde forward (there are options for dosing as a prodrug) although it really would be much better if there was also the option to exchange this warhead for the nitrile (a warhead that is much-loved by industrial medicinal chemists since it’s rugged, polar and contributes minimally to molecular size). While I’d anticipate that replacement of aldehyde with nitrile will lead to a reduction in potency, it’s necessary to quantify the potency loss to enable the potential of nitriles to be properly assessed. The binding mode observed for 1 is shown below in Figure 1 and it’s likely that the groove region will need to be more fully exploited (this article will give you an idea of the sort of thing I have in mind) in order to achieve acceptable potency if the aldehyde warhead is replaced by nitrile.

The COVID Moonshot project currently appears to be in what many industrial drug discovery scientists would call the hit-to-lead phase.  In my view the principal objective of hit-to-lead work is to create options since having options will give the lead optimization team room to manoeuvre (you can think of hit-to-lead work as being a bit like playing in midfield). The COVID Moonshot project is currently focused on exploitation of hits from a fragment screen against MPro and, while I’d question whether this approach is likely to get to a candidate drug more quickly than the conventional structure-based design used in industry to pursue cathepsins, it’s certainly an interesting project that I’m happy to contribute to. It’s also worth mentioning that fragment screens have been run against SARS-CoV-2 Nsp3 macrodomain at UCSF and Diamond since there are no known inhibitors for this target.

Here’s a blog post by Pat Walters in which he examines the structure-activity relationships emerging for the fragment-derived inhibitors. Specifically, he uses a metric known as the Structure-Activity Landscape Index (SALI) to quantify the sensitivity of activity to structural changes. Medicinal chemists apply the term ‘activity cliff’ to situations where a small change in structure results in a large change in activity and I’ve argued that the idea of quantifying the sensitivity of a physicochemical effect to structural modifications goes all the way back to Hammett.  One point that comes out of Pat’s post is that it’s difficult to establish structure-activity relationships for low affinity ligands with a conventional biochemical assay. When applying fragment-based approaches in lead discovery, there are distinct advantages to being able to measure low binding affinity (~ 1 mM) since this allows fragment-based structure-activity relationships to be explored prior to synthetic elaboration of fragment hits. As Pat notes, inadequate solubility in assay buffer clearly places limits on the affinity that can be reliably measured in any assay although interference with the readout of a biochemical assay can also lead to misleading results. This is one reason that biophysical detection of binding using methods such as surface plasmon resonance (SPR) are favored in fragment-based lead discovery. Here’s an article by some of my former colleagues which shows how you can assess the impact of interference with the readout of a biochemical assay (and even correct for it if the effect isn’t too great).     

My first contribution to the COVID Moonshot project is illustrated in Chart 2 and the fragment-derived inhibitor 3 from which I started is also featured in Pat’s post. From inspection of the crystal structure, I noticed that the catalytic cysteine might be targeted by linking a ‘reversible’ warhead from the amide nitrogen (4 and 5). Although this might look fine on paper, the experimental data in this article suggest that linking any saturated carbon to the amide nitrogen will bias the preferred amide geometry away from trans to cis. Provided that the intrinsic gain in affinity resulting from linking the warhead is greater than the cost of adopting the bound conformation, the structural modification will lead to a net increase in affinity and the structures could be locked (here's an article that shows how this can work) into the bound conformation (e.g. by forming a ring).


In addition to being accessible to a warhead linked from the amide nitrogen of 3, the catalytic cysteine is also within striking distance of the carbonyl carbon and it would be prudent to consider the possibility that 3 and its analogs can function as substrates for Mpro. There is precedent for this type of behavior and I’ll point you toward an article that notes that a series of esters identified as cruzain inhibitors can function as substrates and more recent article that presents cruzain inhibitors that I’d consider to be potential substrates. A crystal structure of the protein-ligand complex is potentially misleading in this context since the enzyme might not be catalytically active. I believe that 6 could be used to explore this possibility since the carbonyl carbon would be expected to be more electrophilic and 3-hydroxy, 4-methylpyridine would be expected to be a better leaving group than its 3-amino analog.

This is a good point to wrap things up. I think that Satchel Paige gave us some pretty good advice on how to approach drug discovery and that's yet another reason that Black Lives Matter.

Wednesday, 27 May 2020

COVID-19 stuff

|| next >>

It’s been ages since the last blog post. I’d been thinking of marking my return with an April Fools post but this didn’t seem right given the seriousness of the COVID-19 pandemic. However, I do realize that many people only follow the blog for the April Fools posts so I’ll link them here for easy reference [2013 | 2015 | 2016 | 2017 | 2018 | 2019]. I’m currently in Trinidad so I’ll share a photo from Berwick-on-Sea, on Trinidad's north coast (and the correspondence address for a two [ K2017 | K2019 ] of my more controversial articles). 


I should say at the outset that I’ve never previously worked in antiviral area nor tried to help fight a global pandemic. X-ray crystal structures had been published for the main protease of SARS-Cov-2 back in March and these generated some discussion on twitter with Martin Stoermer and Ash Jogalekar (who actually triggered it). The upshot of the discussion was that the discussion was that a hydrogen bond between protein and ligand appeared to be of suboptimal geometry. Martin and I wrote a short article which we uploaded to figshare and Martin also did a blog post. I’ve decided to post my contributions to the COVID-19 response on figshare rather than cluttering ChemRxiv and bioRxiv with preprints that I have no intention of ever submitting to a journal. I should point out the main protease is just one of a number of SARS-CoV-2 targets that one might exploit and I’ll direct you to this helpful review.

The two inhibitors that Martin and I wrote about are both peptidomimetics and each inhibitor structure incorporates a warhead which can form a covalent bond with the catalytic cysteine sulfur. I was particularly interested in the inhibitor with the đžȘ-ketoamide warhead because the inhibition would be expected to be reversible (always a good idea to check though) and I’ll get on to why that’s significant a bit later in the post. When I examine a crystal structure, I first look for what, out of laziness, I’ll call ‘weaknesses’ in the binding mode. These ‘weaknesses’ can be local as is the case for contact between polar and non-polar regions of molecular surface or a hydrogen bond with less than ideal geometry. However, ‘weaknesses’ can also be non-local when a ligand binds in a form (protonation state, tautomer, conformer) that is relatively high in energy. Generally, ‘weaknesses’ in binding modes should always be seen as design opportunities, especially when they are non-local, and here’s an example of how recognition of instability of the bound conformation was used in fragment-based design of PTP1B inhibitors.

It can be helpful to think in terms of design themes when optimizing both hits and leads. Typically, there is insufficient data for building useful predictive models at the start of a project and the optimization process involves efficient generation of the information required for making decisions. As such optimization of both hits and leads should be seen in a Design of Experiments framework. After seeking insights from BB (my mother's dog), I wrote up some design themes.


A crystallographic fragment screen has been run against SARS-CoV-2 and a number of electrophilic fragments were screened using mass spectroscopy. These two screens serve as a launch pad for the COVID Moonshot which looks interesting (although I’d suggest easing off a bit on the propaganda). One limitation of crystallographic fragment screening is that it is very difficult to measure the affinity of fragments which means that it is not generally feasible to explore the structure-activity relationships of fragments prior to structural elaboration. That said, it’s not impossible and I’ll point you to this article which reports a value of -3.1 kcal/mol for the free energy of binding of pyrazole to protein kinase B that was derived from the concentration response of occupancy. The results of the crystallographic screen also have implications for the design of peptidomimetic inhibitors (in particular, the results point to pyridine as a bioisostere for the pyrrolidinone that is commonly used as a P1 substituent) and these some notes may be helpful. 

Reversibility is an issue that you definitely need to be aware of when designing compounds to inhibit cysteine proteases and these notes may be helpful. The issue arises because formation of a covalent bond between an electrophilic center (commonly referred to as a ‘warhead’) and the thiol of the catalytic cysteine is a commonly used tactic in inhibitor design. I'll direct you to a review of covalent drugs, an article that discusses some of the things that you need consider when working with covalent inhibitors and a blog post on approved covalent drug mechanisms. There does appear to be a degree of prejudice [R1997 | BH2010 | BW2014] against covalent inhibition and some even appear to be unaware that covalent inhibition can be reversible.

If designing covalent cysteine protease inhibitors, I would generally favor reversible inhibition over irreversible inhibition. My primary reason for taking this view is that design of reversible inhibitors is less complex because IC50 can be interpreted in terms of affinity and you can use pretty much the same structure-based approaches as you would for non-covalent inhibitors. You can't really interpret IC50 for an irreversible inhibitor and the enzyme will be 100% inhibited if it's in contract with an irreversible inhibitor for long enough. The inhibitory activity of irreversible inhibitors is typically quantified by the ratio of the inactivation rate constant (kinact) to the inhibition constant (Ki) which makes the enzyme inhibition assay more complex for irreversible inhibitors. Furthermore, you'll need to build transition state models in order to do structure-based design.

It is possible that irreversible inhibition could lead longer duration of action although you also need to consider the consequences of slow inactivation of the enzyme. If thinking along these lines, you should look at this article by Rutger Folmer. Generally, the decision to go for reversible or irreversible inhibitors is one that drug discovery teams should think through carefully and the decision should determine screening tactics (rather than vice versa). 

Wednesday, 29 May 2019

Transforming computational drug discovery (but maybe not)


"A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant."
Manfred Eigen (1927 - 2019)

I'll start this blog post with some unsolicited advice to those who seek to transform drug discovery. First, try to understand what a drug needs to do (as opposed to what compound quality 'experts' tell us a drug molecule should look like). Second, try to understand the problems that drug discovery scientists face and the constraints under which they have to solve them. Third, remember that many others have walked this path before and difficulties that you face in gaining acceptance for your ideas may be more a consequence of extravagant claims made previously by others than of a fundamentally Luddite nature of those whom you seek to influence. As has become a habit, I'll include some photos to break the text up a bit and the ones in this post are from Armenia.

Mount Ararat taken from the Cascade in Yerevan. I stayed at the excellent Cascade Hotel which is a two minute walk from the bottom of the Cascade.

Here are a couple of slides from my recent talk at Maynooth University that may be helpful to machine learning evangelists, AI visionaries and computational chemists who may lack familiarity with drug design. The introductions to articles on ligand efficiency and correlation inflation might also be relevant.

Defining controllability of exposure (drug concentration) as a design objective is extremely difficult while unbound intracellular drug concentration is not generally measurable in vivo.



Computational chemists and machine learning evangelists commonly make (at least) one of two mistakes when seeking to make impact on drug design. First, they see design purely as an exercise in prediction. Second, they are unaware of the importance of exposure as the driver of drug action. I believe that we'll need to change (at least) one of these characteristics of drug design if we are to achieve genuine transformation.

In this post, I'm going to take a look at an article in ACS Medchem Letters entitled 'Transforming Computational Drug Discovery with Machine Learning and AI'. The article opens with a Pablo Picasso quote although I'd argue that the observation made by Manfred Eigen at the beginning of the blog post would be way more appropriate. The World Economic Forum (WEF) is quoted as referring to "to the combination of big data and AI as both the fourth paradigm of science and the fourth industrial revolution". The WEF reference reminded me of an article (published in the same journal and reviewed in this post) that invoked "views obtained from senior medicinal chemistry leaders". However, I shouldn't knock the WEF reference too much since we observed in the correlation inflation article that "lipophilicity is to medicinal chemists what interest rates are to central bankers".

The Temple of Garni is the only Pagan temple in Armenia and is sited next to a deep gorge (about 20 metres behind me). I took a keen interest in the potential photo opportunities presented by two Russian ladies who had climbed the safety barrier and were enthusiastically shooting selfies...

Much of the focus of the article is on the ANI-1x potential (and related potentials), developed by the authors for calculation of molecular energies. These potentials were derived by using a deep neural network to fit calculated (DFT) molecular energies to calculated molecular geometry descriptors. This certainly looks like an interesting and innovative approach to calculating energies of molecular structures. It's also worth mentioning the Open Force Field Initiative since they too are doing some cool stuff. I'll certainly be watching to see how it all turns out.

One key question concerns accuracy of DFT energies. The authors talk about a "zoo" of force fields but I'm guessing the diversity of DFT protocols used by computational chemists may be even greater than the diversity of force fields (here's a useful review). Viewing the DFT field as an outsider, I don't see a clear consensus as to the most appropriate DFT protocol for calculating molecular energy and the lack of consensus appears to be even more marked when considering interactions between molecules. It's also worth remembering that the DFT methods are themselves parameterized.  

Potentials such as those described by the authors are examples of what drug discovery scientists would call a quantitative structure-property relationship (QSPR). When assessing whether or not a model constitutes AI in the context of drug discovery, I would suggest consideration of the nature of the model rather than the nature of the algorithm used to build the model. The fitting of DFT energies to molecular descriptors that the authors describe is considerably more sophisticated than would be the case for a traditional QSPR. However, there are a number of things that you need to keep in mind when fitting measured or calculated properties to descriptors regardless of the sophistication of the fitting procedure. This post on QSAR as well as the recent exchange ( 1 | 2 | 3 ) between Pat Walters and me may be informative. First, over-fitting is always a concern and validation procedures may make an optimistic assessment of model quality when the space spanned by descriptors is unevenly covered. Second, it is difficult to build stable and transferable models if there are relationships between descriptors (the traditional way to address this problem is to first perform principal component analysis which assumes that the relationships between descriptors is linear). Third, it is necessary to account for numbers of adjustable parameters in models in an appropriate manner if claiming that one model has outperformed another.



Armenia appeared to be awash with cherry blossoms when I visited in April. This photo was taken at Tatev Monastery which can be accessed by cable car.

The authors have described what looks to be a promising approach to calculation of molecular energies. Is it AI in the context of drug discovery? I would say, "no, or at least no more so than the QSPR and QSAR models that have been around for decades". Will it transform computational drug discovery? I would say, "probably not". Now I realize that you're thinking that I'm a complete Luddite (especially given my blinkered skepticism of the drug design metrics introduced by Pharma's Finest Minds) but I can legitimately claim to have exploited knowledge of ligand conformational energy in a real discovery project. I say "probably not" simply because drug designers have been able to calculate molecular energy for many years although I concede that the SOSOF (same old shit only faster) label would be unfair. That said, I would expect faster, more accurate and more widely applicable methods to calculate molecular energy to prove very useful in computational drug discovery. However, utility is a necessary, but not sufficient, condition for transformation.


Geghard Monastery was carved from the rock

So I'll finish with some advice for those who manage (or, if you prefer, lead) drug discovery.  Suppose that you've got some folk trying to sell you an AI-based system for drug design. Start by getting them to articulate their understanding of the problems that you face. If they don't understand your problems then why should you believe their solutions? Look them in the eye when you say "unbound intracellular concentration" to see if you can detect signs of glazing over. In particular, be wary of crude scare tactics such as the suggestion that those medicinal chemists that don't use AI will lose their jobs to medicinal chemists who do use AI. If the terrors of being left behind by the Fourth Industrial Revolution are invoked then consider deploying the conference room furniture that you bought on eBay from Ernst Stavro Blofeld Associates.

Selfie with MiG-21 (apparently Artem's favorite) at the Mikoyan Brothers Museum in Sanahin where the brothers grew up. Anastas was even more famous than his brother and played a key role in defusing the Cuban Missile Crisis.

Saturday, 11 May 2019

Efficient trajectories


I'll examine an article entitled ‘Mapping the Efficiency and Physicochemical Trajectories of Successful Optimizations’ (YL2018) in this post and I should note that the article title reminded me that abseiling has been described as the second fastest way down the mountain. The orchids in Blanchisseuse have been particularly good this year and I’ll include some photos of them to break the text up a bit.


It’s been almost 22 years since the rule of 5 (Ro5) was published. While the Ro5 article highlighted molecular size and lipophilicity as pharmaceutical risk factors, the rule itself is actually of limited utility as a drug design tool. Some of the problems associated with excessive lipophilicity had actually been recognized (see Yalkowsky | Hansch) over a decade before the publication of Ro5 in 1997 and there’s also this article that had been published in the previous year. However, it was the emergence of high-throughput screening that can be regarded as the trigger for Ro5 which, in turn, dramatically raised awareness of the importance of physicochemical properties in drug design. The heavy citation and wide acceptance of Ro5 provided incentives for researchers to publish their own respective analyses of large (usually proprietary) data sets and this has been expressed more succinctly as “Ro5 envy”.



So let's take a look at YL2018 and the trajectories. I have to concede that ‘trajectory’ makes it all seem so physical and scientifically rigorous even though ‘path’ would be more appropriate (and easier to say after a few beers). As noted in ‘The nature of ligand efficiency’ (NoLE), I certainly believe that it is a good idea for medicinal chemistry teams to both plot potency (e.g. pIC50) against risk factors such as molecular size or lipophilicity for their project compounds and to analyze the relationships between potency and these quantities. However, it is far from clear that a medicinal chemistry team optimizing a specific structural series against a particular target would necessarily find the plots corresponding to optimization of other structural series against other targets to be especially relevant to their own project.

YL2018 claims that “the wider employment of efficiency metrics and lipophilicity control is evident in contemporary practice and the impact on quality demonstrable”. While I would agree that efficiency metrics are integral to the philatelic aspects of modern drug discovery, I don’t believe that YL2018 actually presents a single convincing example of efficiency metrics being used for decision making in a specific drug design project. I should also point out that each of the authors of YL2018 provided cannon fodder (LS2007 | HY2010 ) for the correlation inflation article and you might want to keep that in mind when you read the words “evident” and “demonstrable”. They also published 'Molecular Property Design: Does Everyone Get It?' back in 2015 and you may find this review of that seminal contribution to the drug design literature to be informative.

I reckon that it would actually be a lot more difficult to demonstrate that efficiency metrics were used meaningfully (i.e. for decision making rather than presentation at dog and pony shows) in projects than it would be to demonstrate that they were predictive of pharmaceutically relevant behavior of compounds. In NoLE, I stated:

"However, a depiction [6] of an optimization path for a project that has achieved a satisfactory endpoint is not direct evidence that consideration of molecular size or lipophilicity made a significant contribution toward achieving that endpoint. Furthermore, explicit consideration of lipophilicity and molecular size in design does not mean that efficiency metrics were actually used for this purpose. Design decisions in lead optimization are typically supported by assays for a range of properties such as solubility, permeability, metabolic stability and off-target activity as well as pharmacokinetic studies. This makes it difficult to assess the extent to which efficiency metrics have actually been used to make decisions in specific projects, especially given the proprietary nature of much project-related data."



YL2018 states, “Trajectory mapping, based on principles rather than rules, is useful in assessing quality and progress in optimizations while benchmarking against competitors and assessing property-dependent risks.” and, as a general point, you need to show you're on top of the physical chemistry if you're going write articles like this.

Ligand efficiency represents something of a liability for anybody claiming expertise in physical chemistry. The reason for this is that perception of efficiency depends on the unit that you use to express affinity and this is a serious issue (in the "not even wrong" category) that was highlighted in 2009 and 2014 before NoLE was published. While YL2018 acknowledges that criticisms of ligand efficiency have been made, you really need to say exactly why this dependence of perception is not a problem if you're going lecture about principles to readers of Journal of Medicinal Chemistry.

Ligand lipophilic efficiency (LLE) which is also known as ligand lipophilicity efficiency (LLE) and lipophilic efficiency (LipE) can be described as offset efficiency metric (lipophilicity is subtracted from potency). As such, perception of efficiency does not change when you use a different unit to express potency and, provided that ionization of ligand is insignificant, efficiency can be seen as a measure of the ease of transfer of ligand from octanol to its binding site. Here's a graphic that illustrates this:

LLE (LipE) measures ease of transfer of ligand from octanol to binding site

I'm not entirely convinced that the authors of YL2018 properly understood the difference between logP and logD. Even if they did, they needed to articulate the implications for drug design a lot more clearly than they have done. Here's an equation that expresses logD as a function of logP and the fraction of ligand in the neutral form at the experimental pH (assuming that only neutral forms of ligands partition into the octanol).


The equation highlights the problems that result from using logD (rather than logP) to define "compound quality". In essence the difficulty stems from the composite nature of logD which means that logD can be also be reduced by increasing the extent of ionization. While this is likely to result in increased aqueous solubility, it is much less likely that problems associated with binding to anti-targets will be addressed. Increasing the extent of ionization may also compromise permeability.    


YL2018 is clearly a long article and I'm going to focus on two of the ways in which the authors present values of efficiency metrics. The first of these is the "% better" statistic which is used to reference specific compounds (e.g. optimization endpoints) to sets of compounds (e.g. everything synthesized by project chemists). The statistic is calculated as the fraction of compounds in the set for which both LE and LLE values are greater than the corresponding values for the compound of interest. The smallest values of the "% better" statistic are considered to correspond to the most optimal compounds. The use of the "% better" statistic could be taken as indicating that absolute thresholds for LE and LLE are not useful for analyzing optimization trajectories..

The fundamental problem with analyzing data in this manner is that LE has a nontrivial dependence on the concentration unit in which affinity is expressed (this is shown in Table 1 and Fig. 1 in NoLE). One consequence of this nontrivial dependence is that both perception of efficiency and the "% better" statistic vary with the concentration unit used to express efficiency.

The second way that the authors of YL2018 present values of efficiency metrics is to plot LE against LLE and, as has already been noted, this is a particularly bone-headed way to analyze data. One problem is that the plot changes in a nontrivial manner if you express affinity in a different unit. This makes it difficult to explain to medicinal chemists why they need to convert the micromolar potencies from their project database to molar units in order for The Truth to be revealed. Another problem is that LE and LLE are both linear functions of pIC50 (or pKi) and that means that the appearance of the plot is heavily influenced by the (trivial) correlation of potency with itself.

A much better way to present the data is to plot LLE against number of non-hydrogen atoms (or any other measure of molecular size that you might prefer). In such a plot, expressing potency (or affinity) in a different unit simply shifts all points 'up' or 'down' to the same extent which means that you no longer have the problem that the appearance of plot changes when you change units. The other advantage of plotting the data in this manner is that there is no explicit correlation between the quantities being plotted. I have used a variant of this plot in NoLE (see Fig. 2b) to compare some fragment to lead optimizations that had been analyzed previously.

I think this is a good point to wrap things up. Even if you have found the post to be tedious, I hope that you have at least enjoyed the orchids. As we would say in Brazil, até mais!

    

Tuesday, 9 April 2019

A couple of talks on Figshare


I've started using Figshare and have uploaded my first two harangues. Late last month, I did a talk on ligand efficiency at Evotec UK that was based on my recent article in Journal of Cheminformatics (which had caused two of the reviewers to spit feathers when the manuscript was initially submitted to Journal of Medicinal Chemistry). Last week, I travelled to Ireland to seek an audience with with Maynooth University's renowned Library Cat (see photo below). Seeing as I was over there anyway, I also gave a talk (Hydrogen bonding in context of drug design) that drew on material from two articles (1 | 2) on hydrogen bonding from our work in Nequimed