Sunday 19 November 2023

On the misuse of chemical probes

It’s now time to get back to chemical probes and I’ll be taking a look at S2023 (Systematic literature review reveals suboptimal use of chemical probes in cell-based biomedical research) which has already been reviewed in blog posts from Practical Fragments, In The Pipeline and the Institute of Cancer Research. Readers of this blog are aware that PAINS filters usually crop up in posts on chemical probes but there are other things that I want to discuss and, in any case, references to PAINS in S2023 are minimal. Nevertheless, I’ll still stress that a substructural match of a chemical probe with a PAINS filter does not constitute a valid criticism of a chemical probe (it simply means that the chemical structure of the chemical probe shares structural features with compounds that have been claimed to exhibit frequent-hitter behaviour in a panel of six AlphaScreen assays) and one is more likely to encounter a bunyip than a compound that has actually been shown to exhibit pan-assay interference.

The authors of S2023 claim to have revealed “suboptimal use of chemical probes in cell-based biomedical research” and I’ll start by taking a look at the abstract (my annotations are italicised in red):

Chemical probes have reached a prominent role in biomedical research, but their impact is governed by experimental design. To gain insight into the use of chemical probes, we conducted a systematic review of 662 publications, understood here as primary research articles, employing eight different chemical probes in cell-based research. [A study such as S2023 that has been claimed by its authors to be systematic does need to say something about how the eight chemical probes were selected and why the literature for this particular selection of chemical probes should be regarded as representative of chemical probes literature in general.] We summarised (i) concentration(s) at which chemical probes were used in cell-based assays, (ii) inclusion of structurally matched target-inactive control compounds and (iii) orthogonal chemical probes. Here, we show that only 4% of analysed eligible publications used chemical probes within the recommended concentration range and included inactive compounds as well as orthogonal chemical probes. [I would argue that failure to use a chemical probe within a recommended concentration range is only a valid criticism if the basis for the recommendation is clearly articulated.] These findings indicate that the best practice with chemical probes is yet to be implemented in biomedical research. [My view is that the best practice with chemical probes is yet to be defined.] To achieve this, we propose ‘the rule of two’: At least two chemical probes (either orthogonal target-engaging probes, and/or a pair of a chemical probe and matched target-inactive compound) to be employed at recommended concentrations in every study. [The authors of S2023 do seem to moving the goalposts since the they’ve criticized studies for not using structurally matched target-inactive control compounds but are saying that using an additional orthogonal target-engaging probe makes it acceptable not to use a structurally matched target-inactive control compound. This  suggestion does appear to contradict the Chemical Probes Portal criteria for 'classical' modulators which do require the use of a control compound  defined as having a "similar structure with similar physicochemistry, non-binding against target".]

The following sentence does set off a few warning bells for me:

The term ‘chemical probe’ distinguishes compounds used in basic and preclinical research from ‘drugs’ used in the clinic, from the terms ‘inhibitor’, ‘ligand’, ‘agonist’ or ‘antagonist’ which are molecules targeting a given protein but are insufficiently characterised, and also from the term ‘probes’ which is often referring to laboratory reagents for biophysical and imaging studies.

First, the terms 'compound' and 'molecule' are not interchangeable and I would generally recommend using 'compound' when talking about biological activity or affinity. A more serious problem is that the authors of S2023 seem to be getting into homeopathic territory by suggesting that chemical probes are not ligands and this might have caused Paul Ehrlich (who died 26 years before Kaiser Wilhelm II) to spit a few feathers.  Drugs and chemical probes are ligands for their targets by virtue of binding to their targets (the term 'ligand' is derived from the Latin 'ligare' which means 'to bind' and a compound can be a ligand for one target without necessarily being a ligand for another target) while the terms 'inhibitor', 'agonist' and 'antagonist' specify the consequences of ligand binding. I was also concerned by the use of the term 'in cell concentration' in S2023 given that uncertainty in intracellular concentration is an issue when working with chemical probes (as well as in PK-PD modelling).  Although my comments above could be seen as nit-picking these are not the kind of errors that authors can afford to make if they’re going to claim that their “findings indicate that the best practice with chemical probes is yet to be implemented in biomedical research”.

Let’s take a look at the criteria by which the authors of S2023 have assessed the use of chemical probes. They assert that “Even the most selective chemical probe will become non-selective if used at a high concentration” although I think it’d be more correct to state that the functional selectivity of a probe depends on binding affinity of the probe for target and anti-targets as well as the concentration of the probe (at its site of action). Selectivity also depends on the concentration of anything that binds competitively with the probe and, when assessing kinase selectivity, it can be argued that assays for ATP-competitive kinase inhibitors should be run at a typical intracellular ATP concentration (here’s a recent open access review on intracellular ATP concentration). The presence of serum in cell-based assays should also be considered when setting upper concentration limits since chemical probes may bind to serum proteins such as albumin which means that the concentration of a compound that is ‘seen’ by the cells is lower than the total concentration of the compound in the assay. In my experience binding to albumin tends to increase with lipophilicity and is also favored by the presence of an acidic group such as carboxylate in a molecular structure.

I’m certainly not suggesting that chemical probes be used at excessive concentrations but if you’re going to criticise other scientists for exceeding concentration thresholds then, at very least, you do need to show that the threshold values have been derived in an objective and transparent manner. My view that it would not be valid to criticise studies publicly (or in peer review of submitted manuscripts) simply because the studies do not comply with recommendations made by the Chemical Probes Portal. It is significant that the recommendations from different groups of chemical probe experts with respect to the maximum concentration at which UNC1999 should be used differ by almost an order of magnitude:

As the recommended maximal in-cell concentration for UNC1999 varies between the Chemical Probes Portal and the Structural Genomics Consortium sites (400 nM and 3 μM, respectively), we analysed compliance with both concentrations.

One of the eight chemical probes featured in S2023 is THZ1 which is reported to bind covalently to CDK7 and the electrophilic warhead is acrylamide-based, suggesting that binding is irreversible. Chemical probes that form covalent bonds with their targets irreversibly need to be considered differently to chemical probes that engage their targets reversibly (see this article). Specifically, the degree of target engagement by a chemical probe that binds irreversibly depends on time as well as concentration (if you wait long enough then you’ll achieve 100% inhibition). This means that it’s not generally possible to quantify selectivity or to set concentration thresholds objectively for chemical probes that bind to their targets irreversibly. It’s not clear (at least to me) why an irreversible covalent inhibitor such as THZ1 was included as one of the eight chemical probes covered by the S2023 study so I checked to see what the Chemical Probes Portal had to say about THZ1 and something doesn’t look quite right.  The on-target potency is given as a Kd (dissociation constant which is a measure of affinity) value of 3.2 nM and the potency assay is described as time-dependent binding established supporting covalent mechanism”.  However, Kd is a measure of affinity (and therefore not a time-dependent) and my understanding is that it is generally difficult to measure Kd for irreversible covalent inhibitors which are typically characterized by kinact (inactivation rate constant) and Ki (inhibition constant) values obtained from analysis of enzyme inhibition data. The off-target potency of THZ1 is summarized as “KiNativ profiling against 246 kinases in Loucy cells was performed showing >75% inhibition at 1 uM of: MLK3, PIP4K2C, JNK1, JNK2, JNK3, MER, TBK1, IGF1R, NEK9, PCTAIRE2, and TBK1, but in vitro binding to off-target kinases was not time dependent indicating that inhibition was not via a covalent mechanism”. The results from the assays used to measure on-target and off-target potency for THZ1 do not appear to be directly comparable.

It’s now time to wrap up and I suggest that it would not be valid to criticise (either publicly or in peer review) a study simply on the grounds that it reported results of experiments in which a chemical probe was used at a concentration exceeding a recommended maximum value. The S2023 authors assert that an additional orthogonal target-engaging probe can be substituted for a matched target-inactive control compound but this appears to contradict criteria for classical modulators given by the Chemical Probes Portal.

Wednesday 27 September 2023

Five days in Vermont

A couple of months ago I enjoyed a visit to the US (my first for eight years) on which I caught up with old friends before and after a few days in Vermont (where a trip to the golf course can rapidly become a National Geographic Moment). One highlight of the trip was randomly meeting my friend and fellow blogger Ash Jogalekar for the first time in real life (we’ve actually known each other for about fifteen years) on the Boston T Red Line.  Following a couple of nights in green and leafy Belmont, I headed for the Flatlands with an old friend from my days in Minnesota for a Larry Miller group reunion outside Chicago before delivering a short harangue on polarity at Ripon College in Wisconsin. After the harangues, we enjoyed a number of most excellent Spotted Cattle (Only in Wisconsin) in Ripon. I discovered later that one of my Instagram friends is originally from nearby Green Lake and had taken classes at Ripon College while in high school. It is indeed a small world.

The five days spent discussing computer-aided drug design (CADD) in Vermont are what I’ll be covering in this post and I think it’s worth saying something about what drugs need to do in order to function safely.  First, drugs need to have significant effects on therapeutic targets without having significant effects on anti-targets such as hERG or CYPs and, given the interest in new modalities, I’ll be say “effects” rather than “affinity”, although Paul Ehrlich would have reminded us that drugs need to bind in order to exert effects. Second, drugs need to get to their targets at sufficiently high concentrations for their effects to be therapeutically significant (drug discovery scientists use the term ‘exposure’ when discussing drug concentration). Although it is sometimes believed that successful drugs simply reduce the numbers of patients suffering from symptoms it has been known from the days of Paracelsus that it is actually the dose that differentiates a drug from a poison.

Drug design is often said to be multi-objective in nature although the objectives are perhaps not as numerous as many believe (this point is discussed in the introduction section of NoLE, an article that I'd recommend to insomniacs everywhere). The first objective of drug design can be stated in terms of minimization of the concentration at which a therapeutically useful effect on the target is observed (this is typically the easiest objective to define since drug design is typically directed at specific targets). The second objective of drug design can be stated in analogous terms as maximization of the concentration at which toxic effects on the anti-targets are observed (this is a more difficult objective to define because we generally know less about the anti-targets than about the targets). The third objective of drug design is to achieve controllability of exposure (this is typically the most difficult objective to define because drug concentration is a dose-dependent, spaciotemporal quantity and intracellular concentration cannot generally be measured for drugs in vivo). Drug discovery scientists, especially those with backgrounds in computational chemistry and cheminformatics, don’t always appreciate the importance of controlling exposure and the uncertainty in intracellular concentration always makes for a good stock question for speakers and panels of experts.

I posted previously on  artificial intelligence (AI) in drug design and I think it’s worth highlighting a couple of common misconceptions. The first misconception is that we just need to collect enough data and the drugs will magically condense out of the data cloud that has been generated (this belief appears to have a number of adherents in Silicon Valley).  The second misconception is that drug design is merely an exercise in prediction when it should really be seen in a Design of Experiments framework. It’s also worth noting that genuinely categorical data are rare in drug design and my view is that many (most?) "global" machine learning (ML) models are actually ensembles of local models (this heretical view was expressed in a 2009 article and we were making the point that what appears to be an interpolation may actually be an extrapolation). Increasingly, ML is becoming seen as a panacea and it’s worth asking why quantitative structure activity relationship (QSAR) approaches never really made much of a splash in drug discovery.

I enjoyed catching up with old friends [ D | K | S | R/J | P/M ] as well as making some new ones [ G | B/R | L ]. However, I was disappointed that my beloved Onkel Hugo was not in attendance (I continue to be inspired by Onkel’s laser-like focus on the hydrogen bonding of the ester) and I hope that Onkel has finally forgiven me for asking (in 2008) if Austria was in Bavaria. There were many young people at the gathering in Vermont and their enthusiasm made me greatly optimistic for the future of CADD (I’m getting to the age at which it’s a relief not to be greeted with: "How nice to see you, I thought you were dead!"). Lots of energy at the posters (I learned from one that Voronoi was Ukrainian) although, if we’d been in Moscow, I’d have declined the refreshments and asked for a room on the ground floor (left photo below).  Nevertheless, the bed that folded into the wall (centre and right photos below) provided plenty of potential for hotel room misadventure without the ‘helping hands’ of NKVD personnel.

It'd been four years since CADD had been discussed at this level in Vermont so it was no surprise to see COVID-19 on the agenda. The COVID-19 pandemic led to some very interesting developments including the Covid Moonshot (a very different way of doing drug discovery and one I was happy to contribute to during my 19 month sojourn in Trinidad) and, more tangibly, Nirmatrelvir (an antiviral medicine that has been used to treat COVID-19 infections since early 2022). Looking at the molecular structure of Nirmatrelvir you might have mistaken trifluoroacetyl for a protecting group but it’s actually a important feature (it appears to be beneficial from the permeability perspective). My view is that the alkane/water logP (alkane is a better model than octanol for the hydrocarbon core of a lipid bilayer) for a trifluoroacetamide is likely to be a couple of log units greater than for the corresponding acetamide.

I’ll take you through how the alkane/water logP difference between a trifluoroacetamide and corresponding acetamide can be estimated in some detail because I think this has some relevance to using AI in drug discovery (I tend to approach pKa prediction in an analogous manner). Rather than trying to build an ML model for making the prediction, I’ve simply made connections between measurements for three different physicochemical properties (alkane/water logP, hydrogen bond basicity and hydrogen bond acidity) which is something that could easily be accommodated within an AI framework. I should stress that this approach can only be used because it is a difference in alkane/water logP (as opposed to absolute values) that is being predicted and these physicochemical properties can plausibly be linked to substructures.

Let’s take a look at the triptych below which I admit that is not quite up to the standards of Hieronymus Bosch (although I hope that you find it to be a little less disturbing). The first panel shows values of polarity (q) for some hydrogen bond acceptors and donors (you can find these in Tables 2 and 3 in K2022) that have been derived from alkane/water logP measurements. You could, for example, use these polarity values to predict that reducing the polarity of an amide carbonyl oxygen to the extent that it looks like a ketone will lead to a 2.2 log unit increase in alkane/water logP.  The second panel shows measured hydrogen bond basicity values for three hydrogen bond acceptors (you can find these in this freely available dataset) and the values indicate that a trifluoroacetamide is an even weaker hydrogen bond acceptor than a ketone. Assuming a linear relationship between polarity and hydrogen bond basicity, we can estimate that the trifluoroacetamide carbonyl oxygen is 2.4 log units less polar than the corresponding acetamide. The final panel shows measured hydrogen bond acidity values (you can find these in Table 1 of K2022) that suggest that an imide NH (q = 1.3; 0.5 log units more polar than typical amide NH) will be slightly more polar than the trifluoroacetamide NH of Nirmatrelvir. So to estimate he difference in alkane/water logP values you just need to subtract the additional polarity of trifluoroacetamide NH (0.5 log units) from the lower polarity of the trifluoroacetamide carbonyl oxygen (2.4) to get 1.9 log units.

Chemical space is a recurring theme in drug design and its vastness, which defies human comprehension, has inspired much navel-gazing over the years (it’s actually tangible chemical space that’s relevant to drug design). In drug discovery we need to be able to navigate chemical space (ideally without having to ingest huge quantities of Spice) and, given that Ukrainian chemists have revolutionized the world's idea of tangible chemical space (and have also made it a whole lot larger), it is most appropriate to have a Ukrainian guide who is most ably assisted by a trusty Transylvanian sidekick. I see benefits from considering molecular complexity more explicitly when mapping chemical space. 
AI (as its evangelists keep telling us) is quite simply awesome at generating novel molecular structures although, as noted in a previous post, there’s a little bit more to drug design than simply generating novel molecular structures. Once you’ve generated a novel molecular structure you need to decide whether or not to synthesize the compound and, in AI-based drug design, molecular structures are often assessed using ML models for biological activity as well as absorption, distribution, metabolism and excretion (ADME) behaviour. It’s well-known that you need a lot of data for training these ML models but you also need to check that the compounds for which you’re making predictions lie within the chemical space occupied by the training set (one way to do this is to ensure that close structural analogs of these compounds exist in the training set) because you can’t be sure that the big data necessarily cover the regions of chemical space of interest to drug designers using the models. A panel discusses the pressing requirement for more data although ML modellers do need to be aware that there’s a huge difference between assembling data sets for benchmarking and covering chemical space at sufficiently high resolution to enable accurate prediction for arbitrary compounds.  

There are other ways to think about chemical space. For example, differences in biological activity and ADME-related properties can also be seen in terms of structural relationships between compounds. These structural relationships can be defined in terms of molecular similarity (Tanimoto coefficient for the molecular fingerprints of X and Y is 0.9) or substructure (X is the 3-chloro analog of Y). Many medicinal chemists think about structure-activity relationships (SARs) and structure-property relationships (SPRs) in terms of matched molecular pairs (MMPs: pairs of molecular structures that are linked by specific substructural relationships) and free energy perturbation (FEP) can also be seen in this framework. Strong nonadditivity and activity cliffs (large differences in activity observed for close structural analogs) are of considerable interest as SAR features in their own right and because prediction is so challenging (and therefore very useful for testing ML and physics-based models for biological activity). One reason that drug designers need to be aware of activity cliffs and nonadditivity in their project data is that these SAR features can potentially be exploited for selectivity.
Cheminformatic approaches can also help you to decide how to synthesize the compounds that you (or your AI Overlords) have designed and automated synthetic route planning is a prerequisite for doing drug discovery in ‘self-driving’ laboratories. The key to success in cheminformatics is getting your data properly organized before starting analysis and the Open Reaction Database (ORD), an open-access schema and infrastructure for structuring and sharing organic reaction data, facilitates training of models. One area that I find very exciting is the use of high-throughput experimentation in the search for new synthetic reactions which can led to better coverage of unexplored chemical space. It’s well known in industry that the process chemists typically synthesize compounds by routes that differ from those used by the medicinal chemists and data-driven multi-objective optimization of catalysts can lead to more efficient manufacturing processes (a higher conversion to the desired product also makes for a cleaner crude product). 

It’s now time to wrap up what’s been a long post. Some of what is referred to as AI appears to already be useful in drug discovery (especially in the early stages) although non-AI computational inputs will continue to be significant for the foreseeable future. I see a need for cheminformatic thinking in drug discovery to shift from big data (global ML models) to focused data (generate project specific data efficiently for building local ML models) and also see advantages in using atom-based descriptors that are clearly linked to molecular interactions. One issue for data-driven approaches to prediction of biological activity such as ML and QSAR modelling is that the need for predictive capability is greatest when there's not much relevant data and this is a scenario under which physics-based approaches have an advantage. In my view, validation of ML models is not a solved problem since clustering in chemical space can cause validation procedures to make optimistic assessments of model quality. I continue to have significant concerns about how relationships (which are not necessarily linear) between descriptors are handled in ML modelling and remain generally skeptical of claims for interpretability of ML models (as noted in NoLE, the contribution of a protein–ligand contact to affinity is not, in general, an experimental observable).

Many thanks for staying with me to the end and hope to see many of you at EuroQSAR in Barcelona next year. I'll leave you with a memory from the early days of chemical space navigation.

Wednesday 26 July 2023

Blogger Meets Blogger

Over the years I’ve had had some cool random encounters (some years ago I bumped into a fellow member of the Macclesfield diving club in the village of Pai in the north of Thailand) but the latest is perhaps the most remarkable (even if it's not quite in the league of Safecracker Meets Safecracker in Surely You’re Joking). I was riding the Red Line on Boston’s T en route to Belmont from a conference in Vermont when my friend Ash Jogalekar, well known for The Curious Wavefunction blog, came over and introduced himself. Ash and I have actually known each other for about 15 years but we’d never before met in real life.

The odds against such an encounter would appear to be overwhelming since Ash lives in California while this was my first visit to the USA since 2015. I had also explored the possibility of getting a ride to Boston (some of those attending had driven to the conference from there) because the bus drops people off at the airport. Furthermore, I was masked on the T which made it more difficult for Ash to recognize me. However, I was carrying my poster tube (now re-purposed for the transport of unclean underwear) and, fortuitously, the label with my name was easy for Ash to spot. Naturally, we discussed the physics of ligand efficiency.

Tuesday 18 July 2023

AI-based drug design?

I’ll start this post by stressing that I’m certainly not anti-AI. I actually believe that drug design tools that are being described as AI-based are potentially very useful in drug discovery. For example, I’d expect natural language processing capability to enable drug discovery scientists to access relevant information without even having to ask questions. I actually have a long-standing interest in automated molecular structure editing (see KS2005) and see the ability to build chemical structures in an automated manner using Generative AI as a potentially useful addition to the drug designer’s arsenal. Physical chemistry is very important in drug design and there are likely benefits to be had from building physicochemical awareness into the AI tools (one approach would be to use atom-based measures of interaction potential and I’ll direct you to some relevant articles: A1989 | K1994 | LB2000 | H2004 | L2009 | K2009 | L2011 | K2016 | K2022 )      

All that said, the AI field does appear to be associated with a degree of hype and number of senior people in the drug discovery field seem to have voluntarily switched off their critical thinking skills (it might be a trifle harsh to invoke terms like “herding instinct” although doing so will give you a better idea of what I’m getting at). Trying to deal with the diverse hype of AI-based drug design in a single blog post is likely to send any blogger on a one-way trip to the funny farm so I’ll narrow the focus a bit. Specifically, I’ll be trying to understand the meaning of the term “AI-designed drug”.

The prompt for this post came from the publication of “Inside the nascent industry of AI-designed drugs” DOI in Nature Medicine and I don’t get the impression that the author of the article is too clued up on drug design: 

Despite this challenge, the use of artificial intelligence (AI) and machine learning to understand drug targets better and synthesize chemical compounds to interact with them has not been easy to sell.

Apparently, AI is going to produce the drugs as well as design them:

“We expect this year to see some major advances in the number of molecules and approved drugs produced by generative AI methods that are moving forward”, Hopkins says.

I’d have enjoyed being a fly on the wall at this meeting although perhaps they should have been asking “why” rather than “how”:

“They said to me: Alex, these molecules look weird. Tell us how you did it”, Zhavaoronkov [sic] says. "We did something in chemistry that humans could not do.”

So what I think it means to claim that a drug has been “AI-designed” is that the chemical structure of the drug has been initially generated by a computer rather than a human (I’ll be very happy to be corrected on this point). Using computers to generate chemical structures is not exactly new and people were enumerating combinatorial libraries from synthetic building blocks over two decades ago (that’s not to deny that there has been considerable progress in the field of generating chemical structures). Merely conceiving a structure does not, however, constitute design and I’d question how accurate it would be to use the term “AI-designed” if structures generated by AI had been subsequently been evaluated using non-AI methods such as free energy perturbation.

One piece of advice that I routinely offer to anybody seeking to transform or revolutionize drug discovery is to make sure that you understand what a drug needs to do. First, the drug needs to interact to a significant extent with one more therapeutic targets (while not interacting with anti-targets such as hERG and CYPs) and this is why molecular interactions (see B2010 | P2015 )  are of great interest in medicinal chemistry.  Second, the drug needs to get to its target(s) at a sufficiently high concentration (the term exposure is commonly used in drug discovery) in order to have therapeutically useful effects on the target(s). This means that achieving controllability of exposure should be seen as a key objective of drug design. One of the challenges facing drug designers is that it’s not generally possible to measure intracellular concentration for drugs in vivo and I recommend that AI/ML leaders and visionaries take a look at the SR2019 study. 

Given that this post is focused on how AI generates chemical structures, I thought it might be an idea to look at how human chemists currently decide which compounds are to be synthesized. Drug design is incremental which reflects the (current) impossibility of accurately predicting the effects that a drug will have on a human body directly from its molecular structure.  Once a target has been selected, compounds are screened for having a desired effect on the target and the compounds identified in the screening phase are usually referred to as hits. 

The screening phase is followed by the hit-to-lead phase and it can be helpful to draw an analogy between drug discovery and what is called football outside the USA. It’s not generally possible to design a drug from screening output alone and to attempt to do so would be the equivalent of taking a shot at goal from the centre spot. Just as the midfielders try move the ball closer to the opposition goal, the hit-to-lead team use the screening hits as starting points for design of higher affinity compounds. The main objective in the hit-to-lead phase to generate information that can be used for design and mapping structure-activity relationships for the more interesting hits is a common activity in hit-to-lead work.  

The most attractive lead series are optimized in the lead optimization phase. In addition to designing compounds with increased affinity, the lead optimization team will generally need to address specific issues such as inadequate oral absorption, metabolic liability and off-target activity. Each compound synthesized during the course of a lead optimization campaign is almost invariably a structural analog of a compound that had already been synthesized. Lead optimization tends to be less ‘generic’ than lead identification because the optimization path is shaped by these specific issues which implies that ML modelling is likely to be less applicable to lead optimization than to lead identification.

This post is all about how medicinal chemists decide which compounds get synthesized and these decisions are not made in a vacuum. The decisions made by lead optimization chemists are constrained by the leads identified by the hit-to-lead team just as the decisions made by lead identification chemists are constrained by the screening output. While AI methods can easily generate chemical structures, it's currently far from clear that AI methods can eliminate the need for humans to make decisions as to which compounds actually get synthesized.

This is a good point at which to wrap up. One error commonly made by people with an AI/ML focus is to consider drug design purely as an exercise in prediction while, in reality, drug design should be seen more in a Design of Experiments framework.  

Thursday 8 June 2023

Archbishop Ussher's guide to efficient selection of development candidates

One piece of advice I gave in NoLE is that “drug designers should not automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant to the specific drug design projects on which they are working” and the L2021 study that I’m reviewing in this post will give you a good idea of what I was getting at when I wrote that. I see a fair amount of relatively harmless “stamp collecting” in L2021 but there are also some rather less harmless errors of the type that you really shouldn’t be making if cheminformatics is your day job.  

I’ll start the review of L2021 with annotation of the abstract:

"Physicochemical descriptors commonly used to define ‘drug-likeness’ and ligand efficiency measures are assessed for their ability to differentiate marketed drugs from compounds reported to bind to their efficacious target or targets. [I would argue that differentiating an existing drug from existing compounds that bind to the same target is not something that medicinal chemists need to be able to do. It is also incorrect to describe efficiency metrics such as LE and LLE as physicochemical descriptors because they are derived from biological activity measurements such as binding affinity or potency.] Using ChEMBL version 26, a data set of 643 drugs acting on 271 targets was assembled, comprising 1104 drug−target pairs having ≥100 published compounds per target. Taking into account changes in their physicochemical properties over time, drugs are analyzed according to their target class, therapy area, and route of administration. Recent drugs, approved in 2010−2020, display no overall differences in molecular weight, lipophilicity, hydrogen bonding, or polar surface area from their target comparator compounds. Drugs are differentiated from target comparators by higher potency, ligand efficiency (LE), lipophilic ligand efficiency (LLE), and lower carboaromaticity. [I may be missing something but stating that drugs tend to differ in potency from non-drugs that hit the same targets does rather seem to be stating the obvious. The same point can also be made about efficiency metrics such as LE and LLE since these are derived, respectively, by scaling potency with respect to molecular size and offsetting potency with respect to lipophicity (LLE).] Overall, 96% of drugs have LE or LLE values, or both, greater than the median values of their target comparator compounds.” [What is the corresponding figure for potency?]

I must admit to never having been a fan of drug-likeness studies such as L2021 (when I first encountered analyses of time dependency of drug properties about 20 years ago I was left with an impression that some senior medicinal chemists had a bit too much time on their hands) and it is now ten years since the term "Ro5 envy" was introduced in a notorious JCAMD article. My view is that the data analysis presented in L2021 has minimal relevance to drug discovery so I’ll be saying rather less about the data analysis than I’d have done had J Med Chem asked me to review the study.

The L2021 study examines property differences between marketed drugs and compounds reported to bind to efficacious target(s) of each drug. Specifically, the property differences are quantified by difference between the value of the property for the drug and the median of the values of property for the target comparator compounds. If doing this then you really do need to account for the spread in the distribution if you’re going to interpret property differences like these (a large difference in values of a property for the drug and the median property for the target may simply reflect a wide spread in the property distribution for the target).  However, I would argue that a more sensible starting point for analysis like this would be to locate (e.g., as a percentile) the value of each drug property within the corresponding property distribution for the target comparator compounds.

Let’s take a look now at how the authors of L2021 suggest their study be used.  

“This study, like all those looking at marketed drug properties, is necessarily retrospective. Nevertheless, those small molecule drug properties that show consistent differentiation from their target compounds over time, namely, potency, ligand efficiencies (LE and LLE), and the aromatic ring count and lipophilicity of carboaromatic drugs, are those that are most likely to remain future-proof. Candidate drugs emerging from target-based discovery programs should ideally have one, or preferably both, of their LE and LLE values greater than the median value for all other compounds known to be acting at the target.”

I would argue that the L2021 study has absolutely no relevance whatsoever to the selection of compounds for development since the team will have data available that enables them to rule out the vast majority of the project compounds for nomination.  A discovery team nominating a compound for development will have achieved a number of challenging objectives (including potency against target and in one or more cell-based assays) and the likely response of team members to a suggestion that they calculate medians for LE and LLE for comparison with nomination candidate(s) is likely to be bemused eye-rolling. In general, a discovery team nominating a development candidate has access to a lot of unpublished potency measurements (which won’t be in ChEMBL) and it’s usually a safe assumption that the development candidate will be selected from the most potent compounds (LE and LLE values for these compounds are also likely to be above average). In the extremely unlikely event that the discovery team nominates a compound with LE or LLE values below the magic median values then you can be confident that the decision has been based on examination of measured data (consider the likelihood of the discovery team members acting on a suggestion that they should pick another compound with LE or LLE value above the magic median values because doing so will increase the probability of success in clinical development).   

As the start of the post, I did mention some errors that you don’t want to be making if cheminformatics is your day job and regular readers of this blog will have already guessed that I’m talking about ligand efficiency (LE). I should point out l that the problem is with the ligand efficiency metric and not the ligand efficiency concept which is both scientifically sound and useful, especially in fragment-based design where molecular size often increases significantly in the hit-to-lead phase. 

The problem with the LE metric is that perception of efficiency changes when you express affinity (or potency) using a different unit and this is shown clearly in Table 1 in NoLE. Expressing a quantity using a different unit doesn’t change the quantity so any change in perception is clearly physical nonsense. That’s why I appropriate a criticism (it’s not even wrong) usually attributed to Pauli when taking gratuitous pot shots at the LE metric.  The change in perception is also cheminformatic nonsense and that’s why it’s rather unwise to use the LE metric if cheminformatics is your day job. L2021 does cite NoLE but simply notes the LE metric’s “scientific basis and application have provoked a literature debate”.

The L2021 study asserts that “the absolute LE value of a drug candidate is less important” but the problem is that even differences in LE change when you express affinity (or potency) using a different concentration unit. This is shown in Table 2 in NoLE and the problem is that there is no objective way to select a particular concentration unit as ‘better’ than all the other concentration units.  To conclude, can we say that a medicinal chemistry leader’s choice of concentration unit (1 M) is any better (or any worse) than that of Archbishop Ussher (4.004 μM)?  

Saturday 1 April 2023

A clear demonstration of the benefits of long residence time

Residence time is a well-established concept in drug discovery and the belief that off-rate is more important than affinity has many adherents in both academia and industry. The concept has been articulated as follows in a Nature Reviews in Drug Discovery article:

“Biochemical and cellular assays of drug interactions with their target macromolecules have traditionally been based on measures of drug–target binding affinity under thermodynamic equilibrium conditions. Equilibrium binding metrics such as the half-maximal inhibitory concentration (IC50), the effector concentration for half-maximal response (EC50), the equilibrium dissociation constant (Kd) and the inhibition constant (Ki), all pertain to in vitro assays run under closed system conditions, in which the drug molecule and target are present at invariant concentrations throughout the time course of the experiment [1 | 2 | 3 | 4 | 5]. However, in living organisms, the concentration of drug available for interaction with a localized target macromolecule is in constant flux because of various physiological processes.”

I used to be highly skeptical about the argument that equilibrium binding metrics relevant are not relevant in open systems in which the drug concentration varies with time. The key question for me was always how the rate of change in the drug concentration compares with the rate of binding/unbinding (if the former is slower than the latter then the openness of the in vivo system would seem to be irrelevant). I also used to wonder why an equilibrium binding measurement made in an open system (e.g., Kd from isothermal titration calorimetry) should necessarily be more relevant to the in vivo system than an equilibrium binding measurement made in a series of closed systems (e.g., Ki from an enzyme inhibition assay). Nevertheless, I always needed to balance my concerns against the stark reality that the journal impact factor of Nature Reviews of Drug Discovery is a multiple of my underwhelming h-index. 

Any residual doubts about the relevance of residence time completely vanished recently after I examined a manuscript by Prof Maxime de Monne of the Port-au-Prince Institute of Biogerontology who is currently on secondment to the Budapest Enthalpomics Group (BEG). The manuscript has not yet been made publicly available although, with the help of my associate ‘Anastasia Nikolaeva’ in Tel Aviv, I was able to access it and there is no doubt that this genuinely disruptive study will forever change how we use AI to discover new medicines.

Prof de Monne’s study clearly demonstrates that it is possible to manipulate off-rate independently of on-rate and dissociation constant, provided that binding is enthalpically-driven to a sufficient degree. The underlying mechanism is back-propagation of the binding entropy deficit along the reaction coordinate to the transition state region where the resulting unidirectional conformational changes serve to suppress dissociation of the ligand. The math is truly formidable (my rudimentary understanding of Haitian patois didn’t help either) and involves first projecting the atomic isothermal compressibility matrix into the polarizability tensor before applying the Barone-Samedi transformation for hepatic eigenvalue extraction. ‘Anastasia Nikolaeva’ was also able to ‘liberate’ a prepared press release in which a beaming BEG director Prof Kígyó Olaj explains, “Possibilities are limitless now that we have consigned the tedious and needlessly restrictive Principle of Microscopic Reversibility to the dustbin of history".

Wednesday 22 February 2023

Structural alerts and assessment of chemical probes

 << previous |

I’ll wrap up (at least for now) the series of posts on chemical probes by returning to the use of cheminformatic models for assessment of the suitability of compounds for use as chemical probes. My view is that there is currently no cheminformatic model, at least in the public domain, that is usefully predictive of the suitability (or unsuitability) of compounds for use as chemical probes and that assessments should therefore be based exclusively on experimental measurements of affinity, selectivity etc. Put another way, acceptable chemical probes will need to satisfy the same criteria regardless of the extent to which they offend the tastes of PAINS filter evangelists (and if PAINS really are as bad as the evangelists would have us believe then they’re hardly going to satisfy these acceptability criteria). My main criticism of PAINS filters (summarized in this comment on the ACS assay interference editorial) is that there is a significant disconnect between dogma and data. 

I’ll start by saying something about cheminformatics since, taken together, the PAINS substructures can be considered as a cheminformatic predictive model. If you’re using a cheminformatic predictive model then you also need to be aware that it will have an applicability domain which is limited by the data used to train and validate the model. Consider, for example, that you have access to a QSAR model for hERG blockade that has been trained and validated using only data for compounds that are protonated at the assay pH.  If you base decisions on predictions for compounds that are neutral under assay conditions then you’d be using the model outside its applicability domain (and therefore in a very weak position to blame the modelers if the shit hits the fan). While cheminformatic predictive models might (or might not) help you get to a desired endpoint more quickly you’ll still need experimental measurements in order to know that you have indeed got the desired end point.

But let’s get back to PAINS filters which were introduced in this 2010 study. PAINS is an acronym for pan-assay interference compounds and you could be forgiven for thinking that PAINS filters were derived by examining chemical structures of compounds that had been shown to exhibit pan-assay interference. However, the original PAINS study doesn’t appear to present even a single example of a compound that is shown experimentally to exhibit pan-assay interference and the medicinal chemistry literature isn’t exactly bursting at the seams with examples of such compounds.

The data set on which the PAINS filters were trained consisted of the hits (assay results in which the response was greater than a threshold when the compound was tested at a single concentration) from six high-throughput screens, each of which used AlphaScreen read-out. Although PAINS filters are touted as predictors of pan-assay interference it would be more accurate to describe them as predictors of frequent-hitter behavior in this particular assay panel (as noted in a previous post promiscuity generally increases as the activity threshold is made more permissive). From a cheminformatic perspective the choice of this assay panel appears to represent a suboptimal design of an experiment to detect and characterize pan-assay interference (especially given that data from “more than 40 primary screening campaigns against enzymes, ion channels, protein-protein interactions, and whole cells” were available for analysis). Those who advocate the use of PAINS filters for the assessment of the suitability of compounds for use as chemical probes (and the Editors-in-Chief of more than one ACS journal) may wish to think carefully about why they are ignoring a similar study based on a larger, more diverse (in terms of targets and read-outs) data set that had been published four years before the PAINS study.     

Although a number of ways in which potential nuisance compounds can reveal their dark sides are discussed in the original PAINS study the nuisance behavior is not actually linked to the frequent-hitter behavior reported for compounds in the assay panel. Also, it can be safely assumed that none of the six protein-protein interaction targets of the PAINS assay panel feature a catalytic cysteine and my view is that any frequent-hitter behavior that is observed in the assay panel for ‘cysteine killers’ is more likely to be due to reaction with (or quenching of) singlet oxygen. It’s also worth pointing out that when compounds are described as exhibiting pan-assay interference (or as frequent hitters) that the relevant nuisance behavior has often been predicted (or assumed) as opposed to being demonstrated with measured data.  I would argue that even a ‘maximal PAINS response’ (the compounds is actually observed as a hit in each of the six assays of the PAINS assay panel) would not rule out the use of a compound as a chemical probe.

I have argued on cheminformatic grounds that it’s not appropriate to use PAINS filters for assessment of potential probes but there’s another reason that those seeking to set standards for chemical probes shouldn’t really be endorsing the use of PAINS filters for this purpose. “A conversation on using chemical probes to study protein function in cells and organisms” that was recently published in Nature Communications stresses the importance of Open Science. However, the PAINS structural alerts were trained on proprietary data and using PAINS filters to assess potential chemical probes will ultimately raise questions about the level of commitment to Open Science. I made a very similar point in my comment on the ACS assay interference editorial (Journal of Medicinal Chemistry considers the publication of analyses of proprietary data to be generally unacceptable).

Let’s take a look at “The promise and peril of chemical probes” that was published in Nature Chemical Biology in 2015. The authors state:

“We learned that many of the chemical probes in use today had initially been characterized inadequately and have since been proven to be nonselective or associated with poor characteristics such as the presence of reactive functionality that can interfere with common assay features [3] (Table 2). The continued use of these probes poses a major problem: tens of thousands of publications each year use them to generate research of suspect conclusions, at great cost to the taxpayer and other funders, to scientific careers and to the reliability of the scientific literature.”

Let’s take a look at Table 2 (Examples of widely used low-quality probes) from "The promise and peril of chemical probes". You’ll see “PAINS” in the problems column of Table 2 for two of the six low-quality probes in and this rings a number of alarm bells for me. Specifically, it is asserted that flavones are “often promiscuous and can be pan-assay interfering (PAINS) compounds” and Epigallocatechin-3-gallate is a “promiscuous PAINS compound” which raises a number of questions. Were the (unspecified) flavones and Epigallocatechin-3-gallate actually observed to be promiscuous and if so what activity threshold was used for quantifying promiscuity? Were any of the (unspecified) flavones or Epigallocatechin-3-gallate actually observed to exhibit pan-assay interference?  Were affinity and selectivity measurements actually available for the (unspecified) flavones or Epigallocatechin-3-gallate?

I’ll conclude the post by saying something about cheminformatic predictive models. First, to use a cheminformatic predictive model outside its applicability domain is a serious error (and will cast doubts on the expertise of anybody doing so). Second, predictions might (or might not) help you get to a desired end point but you’ll still need measured data to establish that you’ve got to the desired end point or that a compound is unfit for a particular purpose.