Tuesday, 5 August 2025

Return to Flatland

Whoever first referred to Economics as ‘The Dismal Science’ had clearly never read an article on ‘3Dness’ in drug discovery.  My own experience reading articles on this topic is a sensation of having my life force slowly sucked out (I even suggested that reviewing the '3Dness' literature might be considered as an appropriate penance when I recently confessed my sins at St Gallen Cathedral) and the subject of Confession reminds me of a song that the late great Tom Lehrer sang about the Second Vatican Council.

In this post I review the CNM2025 study (Return to Flatland) which examines the heavily-cited LBH2009 study (Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success). The CNM2025 study, which has already been reviewed by Dan and Ash, opens with: 

The year is 2009, Barack Obama has just been inaugurated and both Lady Gaga and The Black Eyed Peas are at the height of their popularity

This couldn’t help but remind me of the “WORLD WAR 2 BOMBER FOUND ON MOON” headline that appeared on the front page of the Sunday Sport twenty-one years before the publication of LBH2009 (it was was accompanied by a photo of a B-17 in a lunar crater). A few weeks later the headline was “WORLD WAR 2 BOMBER FOUND ON MOON VANISHES” (this time accompanied by a photo of the now empty lunar crater).

I’ll start my review of CNM2025 by quoting from it and, as is usual for posts here at Molecular Design, quoted text is indented with any comments by me italicized in red and enclosed in square brackets. 

The hypothesis was attractive, and the data clearly showed the relationship between Fsp3 and clinical progression with pairwise significance P < 0.001. [This statement is inaccurate and Figure 3 of the LBH2009 study shows statistically significant differences at this level between (a) discovery and phase 2 compounds (b) phase 1 and phase 3 compounds (c) phase 2 compounds & drugs.  The authors of LBH2009 state: “The change in average Fsp3 was statistically significant between adjacent stages in only one case (phase 1 to phase 2)” but they neither show this in Figure 3 of their article nor do they report a P-value for the statistical significance of the mean difference in Fsp3 between phase 1 and phase 2 compounds.] The statistics seemed compelling, though the effect size was modest — an increase in average Fsp3 of 0.09 between sets of phase I and approved drugs equates to a difference of around two additional sp3 carbons per drug molecule only. [The authors of LBH2009 did not actually report this difference to be statistically significant so it is unclear why the authors of CNM2025 have stated that the “statistics seemed compelling”.]

The LBH2009 study is effectively a call to think beyond aromatic rings in drug design and my view is that there are considerable benefits in doing so even though I consider the data analysis in the study to be shaky. Almost three decades ago I included a quinuclidine in the Zeneca fragment library for NMR screening and later at AstraZeneca I would actively search (with minimal success) for amides and N-substituted heterocycles derived from bicyclic amines. I see the advantages in looking beyond aromatic rings as stemming primarily from increased molecular diversity and higher resolution coverage of chemical space, and in KM2013 we wrote:

Molecular recognition considerations suggest a focus on achieving axial substitution in saturated rings with minimal steric footprint, for example by exploiting the anomeric effect or by substituting N-acylated cyclic amines at C2.

Although data analyses (for example, see HY2010) presented in support of the belief that aromatic rings adversely affect aqueous solubility are typically underwhelming I consider the suggestion to be plausible and suggested in K2022 that deleterious effects of aromatic rings are more likely to be due to their potential for making molecular interactions than to their planarity. That said, I should also point out that the analysis of the relationship between aqueous solubility and Fsp3 presented in Figure 5 of LBH2009 is a textbook example of correlation inflation (see Fig. 5 in KM2013) and I suspect that if a team had submitted this analysis at Statistiques Sans Frontières the judges would have either awarded “nul points” or come to the conclusion that the team had played its joker. Given the Lady Gaga reference in CNM2025 I couldn't resist linking this Peter Gabriel song which includes the lyrics "Adolf builds a bonfire, Enrico plays with it" even though I have absolutely no idea what the the lyrics actually mean.

While the analysis of the relationship between aqueous solubility presented in Figure 5 of LBH2009 does endow the study with what I’ll politely call a whiff of the pasture it’s not directly related to the analysis of clinical progression presented in the study. Let’s take a look at Figure 3 in LBH2009 which shows mean Fsp3 values for compounds in discovery, at the three phases of clinical development, and approved drugs. As an aside this analysis would fall foul of current Journal of Medicinal Chemistry author guidelines (see link; accessed 05-Aug-2025) which clearly mandate that “If average values are reported from computational analysis, their variance must be documented”.  As mentioned earlier in this post Figure 3 in LBH2009 shows statistically significant (P value < 0.001) differences between (a) discovery and phase 2 compounds (b) phase 1 and phase 3 compounds (c) phase 2 compounds & drugs. It’s also worth stressing that Figure 3 in LBH2009 does not show statistically significant differences in Fsp3 for any of the clinical development transitions (phase 1 to phase 2; phase 2 to phase 3; phase 3 to approved drug).

I think that there are some problems with how the authors of the LBH2009 study have analysed the relationship between Fsp3 and progression through the stages of clinical development.  If charged with analysing this data I would focus on the three clinical development transitions (phase 1 to phase 2; phase 2 to phase 3; phase 3 to approved drug) and wouldn’t waste time on comparisons between discovery compounds and clinical compounds. If analysing the relationship between Fsp3 and the progression from phase 1 to phase 2, I would partition the set of phase 1 compounds into a ‘YES’ subset of compounds that had progressed to phase 2 and a ‘NO’ subset of compounds that had not progressed to phase 2. I would certainly be taking a close  look at distributions of Fsp3 values (some approaches to assessing statistical significance are based on the assumption of Normally-distributed data values) and I’d also be thinking about assessing effect size in addition to statistical significance. However, the problems with the LBH2009 analysis are more fundamental than non-Normal distributions of Fsp3 values.

The authors of LBH2009 assess the progression from phase 1 to phase 2 by comparing the mean Fsp3 value for the phase 1 compounds with the mean Fsp3 value for phase 2 compounds. The problem is that the Fsp3 values for the YES compounds (that have progressed from phase 1 to phase 2) are present in both the data sets for which comparisons are being made. This means that the observed differences in mean Fsp3 values will reflect both the difference between YES and NO compounds (relevant to relationship between Fsp3 and progression from phase 1 to phase 2) and the relative numbers of YES and NO compounds in the phase 1 data (not relevant to relationship between Fsp3 and progression from phase 1 to phase 2). Analysing the data in the way that the authors of LBH2009 have done effectively adds noise to the signal and it’s possible that they would have observed more statistically significant differences in mean Fsp3 values had they analysed the data in a more appropriate manner.

This is an appropriate point at which to discuss correlation in the context of studies such as LBH2009 and CNM2025. It’s actually well known (see L2013) that that Fsp3 values for chemical structures tend to be greater when amine nitrogen atoms are present (this does not invalidate the observed trends in the data but has big implications for how you interpret these trends). There is, however, a much bigger issue which is that correlation does not imply causation. Let’s suppose that you’ve just joined a drug discovery team as they are preparing to select a clinical candidate (I concede that this is most improbable scenario but it does illustrate a point). The team have an excellent understanding of the structure-activity relationship (SAR) and have successfully addressed a number of issues during the lead optimization process (the chemical structures of the compounds have been quite literally shaped by the problems that the team members have solved). Now consider the likely reaction of the team members to a suggestion that probability of success in the clinic would increase if the chemical structure of the best compound were modified so as to increase its Fsp3 value. My view is that the team might think that the person making such a suggestion had just stepped off the shuttle from Planet Tharg (an alien from this planet used to make occasional Sunday Sport  appearances). I see the trends in data observed by the authors of LBH2009 as effects rather than causes (the vanishing B-17 was never there in the first place).

Let’s return to the CNM2025 study and its authors state:

Using data from the Cortellis Drug Discovery Intelligence database, we repeated an analysis similar to that of Lovering et al. to assess Fsp3 in drugs approved post-2009 and those in active clinical development as of mid-2024 (Fig. 1). [I would challenge the claim that the analysis presented in CNM2025 is similar to that presented in LBH2009.] Although our methods used contemporary data sources different to Lovering et al., we obtained comparable Fsp3 data for approved drugs prior to 2009. More recently however, the picture appears to have changed with approvals shifting to lower Fsp3 drugs (Fig. 1a). Similarly, when looking at drugs currently in clinical development (Fig. 1b), there appeared to be no clear relationship between highest phase reached and Fsp3, suggesting the key conclusion noted by Lovering et al. has not persisted. In all data sets, exemplars with Fsp3 = 0 as well as Fsp3 = 1 are extensively seen.

Fig. 1a in CNM2025 shows the time-dependence of Fsp3 distributions for approved drugs according to approval date and I remain unconvinced of the value of analysis like this (on first encountering analysis of time-dependence of drug properties a quarter of a century ago I recall being left with the distinct impression that some senior medicinal chemists where I worked had a bit too much time on their hands). However, it is immaterial whether or not you are as underwhelmed as I am by time-dependence of drug properties because no such analysis is actually reported in LBH2009 and this is one reason that I challenge the claim by made by the authors of CNM2025 that they “repeated an analysis similar to that of Lovering et al. to assess Fsp3 in drugs approved post-2009 and those in active clinical development as of mid-2024”.  

Now let’s take a look at Fig. 1b in CNM2025 and this should be compared with Figure 3 in LBH2009. In some ways the former is an improvement on the latter since the violin plots show the distributions of Fsp3 values for each group of compounds and, as mentioned earlier in the post, I don’t think that it makes any sense to include discovery compounds in analysis like this (as the authors of LBH2009 did). Although these two figures look superficially similar they are actually very different and, given that the authors of CNM2025 only include compounds in development during 2024, I would argue that their study does not examine the link between Fsp3 and clinical progression. I agree that the difference between mean Fsp3 values for drugs approved up to 2009 and for drugs approved after 2009 is statistically significant. What is not clear from the analysis summarized in Fig. 1b in CNM2025 is whether the lower Fsp3 values of drugs that were approved after 2009 reflect smaller increases in Fsp3 over the course of clinical development (the B-17 has disappeared from the lunar crater) or lower Fsp3 values for compounds entering clinical development (the B-17 is still in the lunar crater). I think it's possible to address this question but you would need to analyse the data a lot more carefully than the authors of CNM2025 appear to have done. For example, you might examine the time-dependencies of mean Fsp3 values for compounds evaluated in phase 1 and the corresponding mean Fsp3 values for compounds that progressed or failed to progress to phase 2. While I consider more careful analysis of progression to be feasible I see little or no value from the perspective of real world drug discovery in actually performing the analysis more carefully.

This is a good point at which to wrap up. I see variation in drug properties with time as an effect rather than a cause and Forrest Gump would have been well aware of this fifteen years before the publication of LBH2019 when he famously observed that "shit happens". One point on which the CNM2025 authors and I do appear to agree is that there is not currently a B-17 in a lunar crater. Where we appear to differ is that they seem to be suggesting this was because it has vanished while I never believed that it was ever there in the first place. I’ll let the late great Dave Allen have the last word.

Sunday, 6 July 2025

Assembling data sets for training ML bioactivity models

Here’s a photo from one of my exercise walks in Paramin and you can see the Caribbean Sea in the distance. This is perhaps my favourite view on the walk because it means that I’ve just got to the top of a particularly brutal hill (cars sometimes struggle to get to the top and on one occasion I watched a car fail miserably in four attempts) although you can’t always see the sea as clearly as in this photo.  


The current post follows up on my post on the LR2024 study (Combining IC50 or Ki Values from Different Sources Is a Source of Significant Noise). In the current post, I’ll be discussing in general terms how I might use ChEMBL to assemble data sets for training what I refer to in another post as regression-based machine learning (ML) models. These models can reasonably be described as quantitative structure-activity relationships (QSARs) because 'activity' is a continuous (as opposed to categorical) variable. However, the term 'QSAR' does appear to be less used these days, possibly reflecting the limited impact that QSAR approaches have made on real world drug discovery, and it's also much easier to persuade people that you're doing artificial intelligence (AI) if you describe your QSAR models as ML models. In this post I shall refer to regression-based ML models for biological activity simply as 'QSAR-like ML models'.

Much of the focus of AI-based drug design appears to be generation of novel chemical structures and devising synthetic routes for the associated compounds. Many who tout AI as a panacea for the ills of drug discovery appear to be assuming that predictively useful QSAR-like ML models will be available or can readily be built even in the early stages of drug discovery projects. I remain skeptical and my view is that if sufficient data are available in ChEMBL for building useful QSAR-like ML models then it is likely that somebody else has already got to where you would like to be. Nevertheless, I do see value in automating the assembly of bioactivity data sets from ChEMBL even if it does not prove feasible to build useful QSAR-like ML models and I'll also be discussing some of the ways that you might use such data sets in the early stages of a drug discovery project.

My first step when assembling a data set  (which I'll refer to as a 'bioactivity data set') for training QSAR-like ML models would be to extract from ChEMBL all (in-range) measured values for potency and affinity in assays that have been run against the target of interest. Potency and affinity should be expressed logarithmically for modelling as shown in the figure below and the relevant values are often  referred to collectively as ‘pChEMBL’ values (I note in posts here from September and December of 2024, the term is used in the literature without being defined properly). I would generally anticipate that there will be only a single pChEMBL value for most compounds and for compounds for which there are multiple pChEMBL values I would use the mean values to quantify bioactivity for these compounds. In cases where there is more than one pChEMBL value available for individual compounds I would also calculate the standard deviation when two or more pChEMBL values are available for a compounds and this can be seen as another way to assess what is referred to as assay compatibility in the LR2024 study.     
A bioactivity data set assembled in this manner would have a single bioactivity data value for each compound and I would take a look at how many compounds that data is available for because it might be possible to use this information for deciding whether or not to build a QSAR-like ML model. However, you need to be careful about using the size of the data set for making decisions like this because you can get away with with fewer data values if these are better distributed from the perspective of model-building (a view from Orwell's Animal Farm might have been: uniform good, polymodal bad) and the comment that Stalin is alleged to have made about the T-34 tank (quantity has a special quality all of its own) is perhaps not quite the ground truth that many ML modellers believe it to be. JFK's advice to ML modellers might have been: ask not whether you have enough data but whether the available data satisfy the requirements for modelling. 

My next step would be to examine the distribution of data values in the bioactivity data set. I would take a look at the spread in bioactivity values (for modelling the spread in values should be large). If the distribution of the bioactivity data set is Gaussian then a standard deviation of 0.8 log units will place 80% of the data values in a range of 2.05 log units (I used this handy Normal percentile calculator) and I wouldn't attempt to build a QSAR-like ML model if the standard deviation was less than this (unless the person 'asking' me to build the model was also going to perform my annual performance review 😁). I would also visualise the distribution of bioactivity values because a noticeably polymodal distribution should ring a few alarm bells for me (clustering in training data may cause validation procedures to arrive at optimistic assessments of model quality).

Having established an acceptable spread in the bioactivity data I would take a look at where the distribution of bioactivity values is centred. Specifically, I would not attempt to build a QSAR-like ML model unless at least 50% of the compounds in the bioactivity data set exhibited sub-micromolar activity and for a Gaussian distribution this would correspond to a mean bioactivity value of 6. If this seems a bit extreme it’s worth pointing out that to accurately measure an IC50 value of 10 μM requires that the compound be soluble, while neither aggregating nor interfering with assay read-out, at a concentration of 100 μM. Problems with biochemical assays typically increase when you test compounds at higher concentrations and this is one reason that biophysical assays are generally preferred for screening fragments. With sufficient care you can run biochemical assays at high concentrations and the S2009 article by former colleagues shows how you can assess (and potentially correct for) assay interference. Inadequate aqueous solubility, however, is not something that you can generally deal with. One general difficulty when assembling bioactivity data sets from ChEMBL is that it can be very difficult to assess how carefully low affinity compounds have been assayed.


Before starting to assemble a data set for training QSAR-like ML models I would also assess the target from an assay perspective (in a real world drug discovery scenario this assessment would be done in collaboration with bioscientists). In particular, I would be looking for indications, such as kinact values being reported, of activity being due to irreversible mechanisms of action. The bioactivity of an irreversible covalent inhibitor can be considered to be 'two-dimensional' (affinity for formation of non-covalently bound target-ligand complex and rate constant for covalent bond formation) and I'll point you to S2016 and McW2021 for more information.  It is important to have sufficient spread both in the kinact and in Ki values when building QSAR-like ML models for irreversible inhibitors and you also need to be aware of any limits that the assays place on values that can reliably quantified. It is common for IC50 values to be reported in the literature for irreversible inhibitors although you can use such data in drug discovery if you run the assays carefully (see T2021). However, it's important to bear in mind that using a single data value to quantify the bioactivity of an irreversible inhibitor necessarily results in information loss and that the ChEMBL curation procedures do not generally capture assay protocols at the level of detail that would be required for combining IC50 values from different studies even when inhibition is reversible. This should not be taken as a criticism of ChEMBL and I consider recording assay protocols in this level of detail to be well beyond the call of duty for those curating the bioactivity data.   

Now let’s take a look at scenario in which the objective is to initiate a drug discovery project (as opposed to merely building QSAR-like ML models for the purpose of publication). One point that I really do need to stress is that you’re far from helpless if the data available in ChEMBL do not satisfy the requirements for building QSAR-like ML models. First, you can try to source structural analogs of bioactive compounds (there are many more options these days for doing this than when I worked in industry and you can also look beyond ChEMBL, in patents for example, when identifying bioactive compounds) and, in any case, you’re going to need to source pure samples for compounds to check that they are indeed bioactive. Second, you can use the use structures of the active compounds to set up queries for pharmacophore matching and molecular shape matching (see GGP1996 | N2010). Third, if structural information is available for the target you can investigate how the active compounds might be interacting with the target and use this information to source potentially active compounds (these days it is feasible to use free energy calculations to predict affinity in addition to the scoring functions that have long been used for virtual screening and I’ll point you to C2021 | MH2023 | C2023). Fourth, you can look for structure-activity relationships (see SHC2005 for an early example of this and the more recent S2025 study which provides software) in the bioactivity data and one way of achieving this is to search for ‘activity cliffs' (significant differences in bioactivity for pairs of structurally similar compounds; see M2006 | GvD2008 | SB2012 | SHB2019 | vT2022  ) or more generally by analysing bioactivity of neighbourhoods around bioactive compounds. Fifth, you can look for instances of increased polarity, such as replacement of aromatic CH with aromatic N) being well-tolerated from the perspective of bioactivity (this can be thought of both in terms of lipophilic efficiency and as a variation on the activity cliff theme). I should point out that the approaches that I've mentioned in this paragraph can be accommodated within an AI framework if you're prepared to think beyond ML in your definition of AI.

Let’s now suppose that you can satisfy the data requirements or building QSAR-like ML models for the target of interest with data in ChEMBL. Does this mean that you can whip up some QSAR-like ML models, fire up your generative AI and have clinical candidates condensing out of the ether? I think not and one implication of being able to satisfy the data requirements for building QSAR-like ML models is that others will have worked hard in the past trying to get to where you’d like to be in the future. Before you even start to build QSAR-like ML models you’ll need to assess the earlier work from the perspectives of both intellectual property and understanding why it didn't lead to clinical candidates. There are many rabbit holes that you can disappear down in drug discovery and here’s some advice from Otto von Bismarck (ironically it was a young, emotionally unstable, half-English Kaiser with a withered arm who brought down the Iron Chancellor): 

Only a fool learns from his own mistakes. The wise man learns from the mistakes of others.

If the available data do indeed satisfy the requirements for building QSAR-like ML models then it’s a pretty safe assumption that many of the data values will correspond to compounds from one or more structural series (see Figure 1 below which was taken from a previous post). Under this scenario the distribution of data points in the descriptor space is likely to be very uneven and you should anticipate that ‘global’ QSAR-like ML models built using such data will actually be ensembles of local models. One consequence of what I sometimes refer to as ‘clustering’ in the descriptor space is that what you might think is an interpolation is actually an extrapolation (take a look at the point highlighted by the arrow in Figure 1). Clustering in the descriptor space can also cause validation procedures to arrive at optimistic assessments of model quality because most data points have close neighbours and this can lead to overfitting (I discovered at EuroQSAR back in 2016 that some consider it rather uncouth to mention the H2003 study). Correlations between descriptors and related metrics such as Mahalanobis distance become less meaningful when there is a lot of clustering in the descriptor space. This in turn has implications for principal component analysis (commonly used to assess dimensionality of data sets and eliminate correlations between descriptors) and for methods such as PLS (see K1999) that aim to account for correlations between descriptors in regression analysis.
 
For reasons outlined in the previous paragraph I wouldn’t generally combine data from different structural series when building QSAR-like ML models. I would, however, look for relationships between different structural series by, for example, aligning their defining scaffolds (or structural prototypes if you prefer) because this may allow the SAR observed for one scaffold to be overlaid onto another scaffold. Before attempting to build a QSAR-like ML model I would plot pIC50 of against calculated logP for structural series of interest with a view to assessing response of bioactivity to increased lipophilicity (a weak correlation between bioactivity and lipophilicity is desirable but if this is not the case then the response should be at least be relatively steep). I would also fit a straight line to the plot of pIC50 versus calculated logP because this allows the steepness of the response to be quantified and the residuals can be used (as discussed in ‘Alternatives to ligand efficiency for normalization of affinity’ section of K2019) to quantify the extent to which individual pIC50 values beat the trend in the data (this information can be useful to medicinal chemists who wish think about SAR although I have to admit that "the most interesting SAR is likely to be associated with the most deviant values" actually refers youthful antics of the Honourable former Member for Witney). Having performed these simple analyses of the bioactivity data I would attempt to build QSAR-like ML models for each structural series of interest.  

This is a good point at which to wrap up and I'll share some thoughts on the use of QSAR-like ML models in drug design. Back in 2009 I discussed (see K2009) the difference between hypothesis-driven molecular design and prediction-driven molecular design and I suggest that the former can be accommodated within an AI design framework. Some who assert the value of QSAR-like ML models for drug design appear to treat drug design as an exercise in prediction and I've been crapping on for quite a few years (see this post from January 2015) is that it is more appropriately seen in a Design of Experiments framework (generate the necessary data as efficiently as possible). For many drug discovery projects the available data will not satisfy the requirements for building QSAR-like ML models until relatively late in the project and in some cases clinical candidates will be discovered without ever being able to satisfy the data requirements for building QSAR-like ML models (this is more likely to be the case when bioactivity cannot be represented by a single data value as is the case for modalities such as irreversible inhibition and targeted protein degradation). I consider it essential to account for numbers of adjustable parameters and for correlations between descriptors (or features if you prefer) when building QSAR-like ML models, and I’m also concerned that the challenges presented by clustering in descriptor spaces are not properly acknowledged. It also needs to be said that it is consideration of exposure that differentiates drug design from ligand design and I recommend that everybody working in drug discovery and chemical biology read the SR2019 article. 
 

Tuesday, 1 April 2025

Property Forecast Index Validated

<< previous || next >>

I arrived in Korea on Friday night and am greatly enjoying it here. Photos below show the Jungbu Dried Seafoods Market near where I'm staying and dinner on Sunday (spicy beef noodles).



I visited the War Memorial on Sunday and took selfies with the Shenyang J-6 (Chinese version of MiG-19) 'liberated' by Capt. Lee Woong-pyeong when he defected to South Korea on 25th February 1983, a 'liberated' T-34 (as Uncle Joe is said to have observed, quantity has a quality all of its own) and Great Leader's car (also 'liberated' although it was not clear exactly when). 

So enough of the travel photos for now and let's get back to the science. Regular readers (both of them) of this blog will be well aware of my visceral dislike for drug design metrics. One reason for this visceral dislike is that I consider these metrics to trivialise the problems faced by medicinal chemists and I remain sceptical that one can make meaningful predictions of developability or likelihood of clinical success for compounds based only on their chemical structures without knowing anything about their biological activities. One metric that I have criticised harshly in the past is property forecast index (PFI) which was originally introduced as solubility forecast index (SFI). Specifically, I denounced SFI as a ‘draw pictures and wave arms’ data analysis strategy and privately I even considered the possibility that it had been created by a toddler armed with a box of colored crayons.

Let’s take a look at the HY2010 article in which SFI was introduced. Proprietary aqueous solubility measurements (continuous variable) were first processed to assign compounds to one of three aqueous solubility categories. Histograms showing the proportions of measurements in each aqueous solubility category were created by binning values of SFI and of c log DpH7.4 and the histograms were compared visually:    

This graded bar graph (Figure 9) can be compared with that shown in Figure 6b to show an increase in resolution when considering binned SFI versus binned c log DpH7.4 alone.

Recently, I have been forced to revise my negative view of PFI and I have to admit that it pains me deeply to realise that I could have been so utterly wrong for so long in my assessment of what is actually an elegant and highly-predictive drug design metric. Indeed I have now come to the conclusion that the only reason that the Journal of Medicinal Chemistry did not include PFI in its nomination for the Nobel Prize in Physiology or Medicine was that the introduction of the Ro5, LipE and Fsp3 principles led directly to so many marketed drugs being approved.

What has caused such a fundamental shift in my views? First, PFI is highlighted in the European Federation of Medicinal Chemistry (EFMC) ‘Best Practices from Hits to Lead Generation’ webinar.  Now it goes without saying that EFMC includes some of the sharpest minds in medicinal chemistry and, given that they consider PFI to be sufficiently important for inclusion in a best practices webinar, it became abundantly clear that I needed to revise my hopelessly naïve thinking. Let’s join the webinar at 27:53 and you’ll see in the webinar slide that SFI (as PFI was originally introduced) has been strongly endorsed by Practical Cheminformatics, a blog that many, including me, accept without question as the source of a number of fundamental ground truths in the AI field.

However, what convinced me of the sublime elegance and extreme predictivity of PFI is a seminal study by the world-renowned expert on tetrodotoxin pharmacology, Prof. Angelique Bouchard-Duvalier of the Port-au-Prince Institute of Biogerontology, working in collaboration with the Budapest Enthalpomics Group (BEG). The manuscript has not yet been made publicly available although I was able to access it with the help of my associate ‘Anastasia Nikolaeva’ (not sure exactly what she’s doing these days although she did post a photo from Pyongyang showing her and a burly chap with a toothy grin and a bizarre haircut). There is no doubt that this genuinely disruptive study will comprehensively reshape the predictive ADME landscape, enabling drug discovery scientists, for the very first time, to make accurate predictions for developability and probability of critical trial success using only chemical structures as input.

Prof. Bouchard-Duvalier’s seminal study clearly demonstrates that graphical presentation of categorized continuous data outperforms regression analysis performed on the uncategorized continuous data. The math is truly formidable (my rudimentary understanding of Haitian patois didn’t help either) and involves first projecting the atomic isothermal compressibility matrix into the quadrupole-normalized polarizability tensor before applying the Barone-Samedi transformation, followed by hepatic eigenvalue extraction using an algorithm devised by E. V. Tooms (a reclusive Baltimore resident whose illustrious research career in analytic topology was abruptly halted almost 31 years ago by an unfortunate escalator accident). The incisive analysis of Prof. Bouchard-Duvalier shows without a shadow of doubt that the data visualization used to establish PFI as a fundamental drug design principle will reliably and robustly outperform all AI approaches to prediction of aqueous solubility. Furthermore, ‘Anastasia Nikolaeva’ was also able to ‘liberate’ a prepared press release in which the beaming BEG director Prof. Kígyó Olaj explains that, “Possibilities are limitless now that we can accurately and robustly predict the developability of a compound using only its chemical structure as input and we can now finally consign regression analysis to the dustbin of history. Surely the Editors of Journal of Medicinal Chemistry will recognize the impact of PFI on real world drug discovery when they make their Nobel Prize nominations later this year.” 

Sunday, 9 March 2025

Thinking About Aqueous Solvation

Given that it was International Women's Day yesterday, I'll open the the post (and blogging for 2025) with a photo of a gravestone at St James' Church in Bramley (Hampshire).

In the current post I’ll be taking a look at some aspects of aqueous solvation and Richard Wolfenden’s 1983 “Waterlogged Molecules” article (W1983) is still worth reading today (as an aside, Prof Wolfenden will turn ninety in May of this year and hopefully mentioning this won't put what is called "goat mouth" in my native Trinidad and Tobago on him as I did for Oscar Niemeyer with the words "ele vive ainda" while studying Portuguese in 2012). As noted in W1983 the formation of a target-ligand complex requires partial desolvation of both target and ligand:

When biological compounds combine, react with each other, or change shape in watery surroundings, solvent molecules tend to be reorganized in the neighborhood of the interacting groups.

Formation of a target-ligand can also be seen as an “exchange reaction” and this point is very well made in SGT2012:

Molecular binding in an aqueous solvent can be usefully viewed not as an association reaction, in which only new intermolecular interactions are introduced between receptor and ligand, but rather as an exchange reaction in which some receptor–solvent and ligand–solvent interactions present in the unbound state are lost to accommodate the gain of receptor–ligand interactions in the bound complex.

In HBD3 I briefly discuss ‘frustrated hydration’ as a phenomenon that could be exploited in drug design and I’ll quote from the Summary section of W1983:  

When two or more functional groups are present within the same solute molecule, their combined effects on its free energy of solvation are commonly additive. Striking departures from additivity, observed in certain cases, indicate the existence of special interactions between different parts of a solute molecule and the water that surrounds it.

I’ll try to explain how this could work for ligand design and let’s suppose that we have two polar atoms that are close together in the binding site. The proximity of the polar atoms in the binding site means that water molecules forming ideal interactions with the polar atoms in the binding sites are also likely to be close together. However, the mutual proximity of the water molecules can lead to unfavourable interactions between the water molecules which ‘frustrate’ the (simultaneous) hydration of the two polar atoms in the binding site. Now if we design a ligand with two polar atoms positioned to form good interactions with polar atoms in the binding site it is likely that these will also be in close proximity and that their hydration will be similarly frustrated. I would generally anticipate that frustration of hydration will not be handled well by implicit solvent models (RT1999 | FB2004 | CBK2008 KF2014)  or computational tools such as WaterMap that calculate energetics for individual water molecules (especially in cases where the two hydration sites cannot be simultaneously occupied).

To illustrate frustration of hydration I’ve taken a graphic from a talk from 2023. The unfavorable interactions between solvating water molecules that frustrate hydration are shown as red double-headed water molecules (in some cases these interactions will be repulsive to the extent that only one of the hydration sites can be occupied at a time). You’ll also notice two thick green lines in the right hand panel and these show secondary interactions that stabilize the bound complex. Secondary interactions of this nature were discussed in a molecular recognition context in the JP1990 study and the observation (see A1989) that pyridazine is a better hydrogen bond acceptor (HBA) than its pKa would have you believe can be seen in a similar light.  Secondary interactions like these only enhance affinity when the proximal polar atoms are of the same ‘type’ (the proximal polar atoms in the 1,8-naphthyridine are both HBAs) and we should anticipate that the secondary interactions for the contact between pyrazole and the ‘hinge’ of a tyrosine kinase will be deleterious for affinity. In contrast to secondary interactions, frustration of hydration can be beneficial for affinity even when the proximal polar atoms are of opposite types, as would be the case for an HBA that is near to a hydrogen bond donor (HBD).     

While it is clearly important to account for aqueous solvation when using physics-based approaches for prediction of binding affinity, passive permeability and aqueous solubility, the measurement of gas-to-water transfer free energy is not exactly routine (I’m not aware that any companies offer measurement aqueous solvation energy as a service nor do I believe that this is an activity that would readily funded). Measurements for aqueous solvation energy reported in the literature tend to be for relatively volatile compounds and I’ll direct readers to the C1981, W1981 and A1990 studies.

A view is that I've held for many years is that a partition coefficient could be used as an alternative to gas-to-water transfer free energy for studying aqueous solvation. It's also worth noting that when we think about desolvation in drug design we're often considering the energetic cost of bringing polar atoms into contact with non-polar atoms (as opposed to transferring the polar atoms to gas phase). Partition coefficient measurement is a lot more routine than solvation free energy measurement and most drug discovery scientists are of aware that the octanol/water partition coefficient (usually quoted as its base 10 logarithm logP) is an important design parameter. However, the octanol/water partition coefficient is not useful for assessing aqueous solvation because the hydroxyl group of octanol can form hydrogen bonds with solutes and the water-saturated solvent is actually quite 'wet' (the DC1992 study reports that the room temperature solubility of water in octanol is 2.5 M). If we’re going to use partition coefficient measurements for studying aqueous solvation then I would argue that we should make these measurements with a saturated hydrocarbon such as cyclohexane or hexadecane that lacks hydrogen bonding capability.

Here’s another slide from that 2023 talk showing that pyridine is lipophilic for octanol/water but hydrophilic for hexadecane/water. The difference in the logP values for a solute is sometimes referred to as ΔlogP (it is equivalent to the hexadecane/water logP value with both solvents water-saturated) and can be considered to quantify the solute’s ability to form hydrogen bonds (see Y1988 | A1994 | T2008). I'll mention in passing that ΔlogP measurements with toluene as the less polar organic solvent have been used to study intramolecular hydrogen bonding (see S2013 | C2016 | C2018).     


It should be stressed  that people have been thinking about using different organic solvents for partition coefficient measurement for a lot longer than me. My view, expressed in K2013, is that the justification in H1963 for using octanol was partly based on a misinterpretation of Collander's C1951 study. I really like this quote from Alan Finkelstein's 1976 article (as an aside the partition coefficient literature is not exactly awash with alkane/water logP measurements for amides and the article reports measured values of the hexadecane/water partition coefficient for acetamide, formamide, urea, butyramide and isobutyramide): 

It has long been fashionable to worry about which organic solvent (and polarity) is the best model for the lipoidal region of a particular cell membrane (Collander, 1954). These solvents have ranged from isobutanol (the most polar) to olive oil (the least polar). I have never understood the point of this. If the lipoidal region of the plasma membrane is a lipid bilayer, then clearly the appropriate model solvent is hydrocarbon. For artificial bilayers this is obviously so. I chose n-hexadecane as the particular hydrocarbon, because its chain length is comparable to that of the fatty acid residues in most phospholipids, and it is conveniently available.

I also need to mention the B2016 study (Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge) since the the cyclohexane/water distribution coefficient was used as a surrogate for gas-to-water transfer free energy in the challenge:

The inclusion of distribution coefficients replaces the previous focus on hydration free energies which was a fixture of the past five challenges (SAMPL0-4) [1 | 2 | 3 | 4 | 5 | 6 | 7]. Due to a lack of ongoing experimental work to generate new data, hydration free energies are no longer a practical property to include in blind challenges. It has become increasingly difficult to find unpublished or obscure hydration free energies and therefore impossible to design a challenge focusing on target compounds, functional groups or chemical classes.

I consider initiatives such as the SAMPL5 cyclohexane/water distribution challenge to be valuable for assessing model predictivity in an objective and transparent manner. Generally, I would avoid including logD measurements for compounds that are significantly ionized under experimental conditions because these require that account be taken of ionization when making predictions (better to measure logD at a pH at which ionizable functional groups are not significantly ionized). While challenges such as SAMPL5 are certainly valuable for assessment of predictivity of models, I consider them less useful in model development which requires measured data for structurally-related compounds. 

The isosteric pairs 1/2  and 3/4 shown in the graphic below will give you an idea of what I'm getting at. The predicted pKBHX values taken from K2016 suggest that 1 is less polar than than its isostere 2 and I'd expect 3 to be more polar than 4.

While the three N-butylated purines shown in the graphic below are not strictly isosteric I would consider it valid to interpret the cyclohexane/water logP values taken from S1998 as reflecting differences in hydrogen bond acceptor strength.

This is a good point at which wrap up and, given the fundamental importance of aqueous solvation in biomolecular recognition and drug design, I see tangible advantages in having a large body of measured data in the public domain. My view is that to measure gas-to-water transfer free energy for significant numbers of compounds of interest to drug discovery scientists would be both technically demanding and unlikely to get funded although I would be delighted to be proven wrong on either point. This means that we need to learn to use other types of data in order to study aqueous solvation and my view is that an alkane/water partition coefficient would be the best option. Using alkane/water partition coefficients as an alternative to gas-to-water transfer free energies for studying aqueous solvation would also enable enthalpic (see RT1984) and volumetric aspects of aqueous solvation to be investigated more easily.     

Tuesday, 31 December 2024

Natural Intelligence?

**********************************************************
My pulse will be quickenin'
With each drop of strychnine
We feed to a pigeon
It just takes a smidgin
To poison a pigeon in the park

Tom Lehrer, Poisoning Pigeons in the Park | video
*********************************************************

I’ll be reviewing the H2024 study (Occurrence of “Natural Selection in Successful Small Molecule Drug Discovery) in this post. Derek has already posted on the H2024 study which has been included in the BL2024 Virtual Special Issue on natural products (NPs) in medicinal chemistry. I'll also mention reviews here at Molecular Design of the the related studies (4) (see post) and (24) (see post). As is usual for Molecular Design reviews of literature I have used the same reference numbers that were used in H2024 and quoted text is indented with any comments by me in square brackets and italicised in red. Given the serious concerns I have about H2024 this is going to be a long post and there are a couple of disclaimers that I need to make before starting the review:

  1. I regard identification and biological characterisation of NPs as vital scientific activities that should be generously funded and Derek puts it very well in his recent post ("When you see specific and complex small molecules that living creatures are going to the metabolic trouble to prepare, there are surely survival-linked functions behind them."). In particular, I see it as important that NPs be screened in diverse phenotypic assays and here’s a link to the Chemical Probes Portal. While my criticisms of H2024 are certainly serious it would be grossly inaccurate to take these criticisms as indicative of an anti-NP position.
  2. Automation of workflows (N2017) and generation of datasets from databases such as ChEMBL are far from trivial and (33), which highlights some of the challenges faced by researchers in this area, was the subject of a recent post at Molecular Design. I consider method development in this area to be an important cheminformatic activity that should be adequately supported. It must also be stressed that the design, building and updating of databases such as ChEMBL (G2012 | B2014 | P2015 | G2017 | 23) are vital scientific activities that should be generously funded (had it not been of the vision and foresight of the creators of the PDB over half a century ago it is improbable that the 2024 Chemistry Nobel Prize would have been awarded for “computational protein design” and “protein structure prediction”). While my criticisms of H2024 are certainly serious it would be grossly inaccurate to take these as criticisms of the automated dataset generation described in the study (and recently published in H2024b) or of the contributions by a number of individuals that have made ChEMBL an invaluable resource for drug discovery scientists and chemical biologists.

Hampi, November 2013

Having made the disclaimers, I’ll open my review of H2024 with some general observations. First, I do not consider that H2024 presents any insights of practical value to medicinal chemists nor do I consider the analyses presented in the study to support the assertion that “there is untapped potential awaiting exploitation, by applying nature’s building blocks─’natural intelligence’─to drug design” (in my view the use of the term “natural intelligence” does rather endow the study with what I’ll politely refer to as a distinctly pastoral odour). Second, the results of the analyses presented in H2024 do not demonstrate any tangible benefits from the drug design perspective of incorporating structural features that have been anointed as 'natural' by the authors (my view is that it would be extremely difficult to design data analyses to address the relevant questions in an objective manner). Third, the authors of H2024 present a ‘scaffold-centric’ view of NPs in which the naturalness of NPs is due to cyclic substructures present within their chemical (2D) structures (it is almost as if these 'natural' substructures are considered to be infused with 'vital force') and I would question whether this is a realistic view from the molecular recognition and physicochemical perspectives.  Fourth, the meaning of what the authors of H2024 are calling 'enrichment' of pseudo-NPs (PNPs) in clinical compounds is unclear and, in any case, the 'enrichment' values do seem rather low (never more than twofold) when you consider the numbers of compounds that successful discovery project teams typically have to synthesize in order to deliver a drug that gets to market.

It's not clear (at least to me) what the authors of H2024 mean by ‘natural selection’ and at times their view of natural selection appears to be closer to Lysenkoism than Darwinism. For example, they assert in the conclusions section of H2024 that “NP structural motifs are provided predesigned by nature, constructed for biological purposes as a result of 4 billion years of evolution.” Design actually has no place in natural selection and perhaps the authors are thinking of 'Intelligent Design' which is a doctrine with many adherents in the Creationist community.  While I don’t dispute that the chemical structures of many clinical compounds contain substructures that are also found in the chemical structures of NPs, I think that it would be extremely difficult to objectively compare different explanations for the observations (it's worth remembering that correlation does not imply causation). The explanation favoured by the authors of H2024 is that compounds assembled from Nature’s building blocks are ‘better’ and a stated aim of the study is “to seek further support for the existence of ‘natural selection’ in drug discovery” (this video will give readers an idea of what the late great Dave Allen might have made of this). In my view the data analyses presented in H2024 are not actually based on statistics and are therefore unfit for the purpose of testing hypotheses. Put another way, if you're going to use data analysis to look for something then it would be a good idea to use methods capable of telling you that you that haven't found what you were looking for.    
 
The data analyses in H2024 are largely based on quantities (PNP_Status | Frag_coverage_Murcko | NP-likeness) that are calculated from the chemical (2D) structures of compounds.  However, the authors do not state which software was used to perform the calculations and, had I been a reviewer, I would have drawn their attention to the following directive in the Data Requirements section in the J Med Chem Author Guidelines (accessed 27-Dec-2024):

9. Software. Software used as a part of computer-aided drug design should be readily available from reliable sources, and the authors should specify where the software can be obtained.

As was the case for my review of (24) I see much of the analysis in H2024 as relatively harmless “stamp collecting” (in contrast, as discussed in KM2013, I consider presentations and analyses of data that exaggerate trend strength, such as those used in the HMO2006LS2007, LBH2009, HY2010 and TY2020 studies to be anything but harmless). The analyses that I’ll be examining in this post are of comparisons between clinical compounds and reference compounds although I'll comment in general terms on the analyses of time-dependencies of characteristics of clinical compounds. My general criticism of H2024 is not that the analyses presented by its authors are necessarily invalid but that they fail to provide any useful insight and I’ll share an insightful observation by Manfred Eigen (1927-2019):

A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant.

I first encountered analyses of time-dependencies of drug properties about two decades ago and rapidly came to the conclusion that some senior medicinal chemists where I worked had a bit too much time on their hands.  The fundamental flaw in the interpretation of these analyses is that time-dependencies of the properties of drugs and other clinical compounds are presented as causes rather than effects and it has never been clear how medicinal chemists working on drug discovery projects in the real world should use the results from such analyses. The authors claim that “changes to drug properties over time are significant” and I would challenge them to present even a single example of such analysis being used to meaningfully inform decision-making in a drug discovery project. It must be stressed that my criticism of analyses of time-dependency of the properties of drugs and other clinical compounds is simply that they don't provide useful insights and not that the analyses are necessarily invalid. That said, I do have general concerns about how time-dependencies are compared when some of the properties are expressed as logarithms and some are not. As reviewer I would have recommended that the vertical axis of the plot in the graphical abstract be drawn from 0% to 100% rather than from 30% to ~67%.

As is the case for analyses of time-dependency, my criticism of analyses of the differences between clinical compounds and reference compounds is that they don’t provide useful insight and there is no suggestion that the analyses are necessarily invalid. Before looking at the analyses presented in H2024 I’ll quote from the abstract of (24) because this will give you an idea of what I mean by analyses not providing useful insight:

Drugs are differentiated from target comparators by higher potency, ligand efficiency (LE), lipophilic ligand efficiency (LLE), and lower carboaromaticity.

As I noted in this post (this focused principally on the invalidity of the LE metric as discussed in NoLE) reporting that an analysis has shown drugs to be differentiated by potency from target comparators does seem to be stating the obvious and, given how LE and LLE are defined, it is perhaps not the most penetrating of insights to observe that values of these efficiency metrics tend to be greater for drugs than for comparator compounds. While the observation of lower carboaromaticity of drugs relative to comparator compounds is non-obvious, it does not constitute information that can be used for medicinal chemistry decision-making in specific discovery projects (as we noted in KM2013 carboaromaticity and lipophilicity can both be reduced simply by replacing a benzene ring with benzoquinone).

Let’s take a look at how this type of analysis is used in H2024. The authors of H2024 note that “comparing Figure 3a,b shows a clear ‘enrichment’ of PNPs in clinical compounds versus reference compounds in the post-2008 period” and two of these authors, writing in (17), assert that “PNPs have increasingly been explored in recent drug discovery programs, and are strongly enriched in clinical compounds”.  What the authors of H2024 are calling 'enrichment' is rather different to the enrichment in structural features that results from high-throughput screening (HTS) and it’s important to understand the difference. Let’s suppose that we’ve screened a library of compounds of which 1% are pyrimidines and 1% are pyrazines and we find that 10% of the hits are pyrimidines and 0.1% are pyrazines (to simplify things you can assume there is no compound in the library with a pyrimidine and a pyrazine in its chemical structure). In this case we would conclude that the process of screening has resulted in a tenfold enrichment for pyrimidines and a tenfold impoverishment for pyrazines. Now let's create a 'selected azines' category by combining the pyrimidines and pyrazines which as a structural class comprise 2% of the screening library compounds but 10.1% of the hits. What I'm getting at here is that enrichment of an more inclusive structural class such as 'selected azines' (or PNPs) does not imply that each and every one of the structural classes covered by the inclusive structural class definition will also be enriched.

Now let’s take a look at how the 'enrichment' of PNPs in clinical compounds is assessed in H2024. First, a set of reference compounds is generated for each clinical compound (this is discussed in detail in H2024b) and the sets of reference compounds are combined. 'Enrichment' is then assessed by comparing the fraction of clinical compounds that are PNPs with the fraction of compounds in the combined reference sets that are PNPs. When we assess enrichment of chemotypes in HTS the hits are all selected (by the screening process) from the same reference pool of compounds. In contrast, each clinical compound in the H2024 analysis is associated with a different reference set of compounds (from the perspective of data analysis combining reference sets defined in this manner gratuitously throws information away). As a reviewer I would have pressed the authors to enlighten readers as to how they should interpret the proportions of PNPs in the reference sets for individual compounds.

It's worth thinking about what the reference compound set might look like for a clinical compound that is a PNP. The proportion of PNPs in the reference set will generally be influenced by factors such as availability of data, the ‘rarity’ of the structural features of the drug and the ‘tightness’ of the structure-activity relationship (SAR).  A more permissive definition of ‘activity’ would generally be expected to make SAR appear to be less ‘tight’ (or ‘looser’ if you prefer). Compounds were defined as ‘active’ for the analysis on the basis of a recorded pChEMBL value against one of the clinical compound’s targets (as a reviewer I’d have suggested that the authors define the term ‘pChEMBL’) which means that a compound might have been selected for inclusion in a reference set on the basis of an IC50 value of 100 μM.

Let’s define 'enrichment' by dividing the fraction of the clinical compounds that are PNPs by the fraction of reference compounds that are PNPs. When we select a reference set for a clinical compound that is a PNP then it’s extremely unlikely that every single compound in the reference set will also be a PNP (especially if we’re accepting compounds with IC50 values 100 μM as ‘active’) and it’s even less likely that every single compound in the combined reference sets will be a PNP. This means that we should generally expect the clinical compounds that are PNPs to be ‘enriched’ in PNPs when compared with their combined reference sets. We can apply exactly the same logic to conclude that we should expect that the combined reference sets for the clinical compounds that are not PNPs  (under this scenario we would conclude that the set of clinical compounds that are not PNPs are infinitely impoverished in PNPs when compared with their combined reference sets). This means that we should expect that the 'enrichment' of PNPs in the clinical compound set in comparison with their combined reference sets will increase with the fraction of clinical compounds that are PNPs.

Let’s take another look at the plot in the graphical abstract which shows the fractions of clinical compounds and reference compounds that are PNPs as a function of time. Notice how the lines tend to be furthest apart when the fraction of clinical compounds that are PNPs is relatively high. As a reviewer, I would have required that the authors examine the correlation between the logarithm of the fraction of clinical compounds and the logarithm of the enrichment (a relatively strong correlation would indicate that the information added by the combined reference sets is minimal). The 'enrichments' calculated from the plot in the graphical abstract are underwhelming (the highest degree of enrichment is the 2014 value of just over 1.5-fold and this value seems very low when you consider the numbers of compounds that successful discovery project teams typically need to synthesize in order to get drugs approved).  From 2011 the fraction of clinical compounds that are PNPs exceeds 50% but I wouldn't consider it accurate to use the term "strongly enriched" (17) because the fraction of reference compounds that are PNPs is 40% or greater for this time period (plotting the vertical axis in the graphical abstract from 30% to ~67%  creates the illusion that the 'enrichment' is greater than it actually is).

I do have a number of other gripes about the data analysis in H2024 but I do also need to take a look at PNPs and the following assertion by the authors is an appropriate point at which to start this discussion:

The PNP concept has been validated by its appearance in the literature (16,17) and by the design of several new classes of biologically active compounds. (18,19) [As a reviewer I would have pressed the authors to clearly articulate the “PNP concept” (just as I would have pressed the authors of this Editorial to clearly articulate the new principles that their nominees for the Nobel Prize in Physiology or Medicine had introduced).  My view is that it is verging on megalomania to claim that a concept “has been validated by its appearance in the literature” and I don’t consider (18) to support the claim for “design of several new classes of biologically active compounds”. To support such a claim, one would ideally need to demonstrate that screening of libraries of compounds designed as PNPs resulted in the discovery of viable lead series against a range of therapeutic targets. At absolute minimum, one would need to show that libraries of compounds designed as PNPs exhibited exploitable activity across a range of target-related assays (although interesting, the results from the “cell painting assay” would not by themselves support a claim for “design of several new classes of biologically active compounds”). I should also mention that some in the compound quality field (see B2023 and my review of that article) interpret activity against multiple targets for a set of compounds based on a particular scaffold as evidence for pan-assay interference even when the individual compounds don’t themselves exhibit frequent-hitter behaviour. I don't have access to (19) and am therefore unable to assess the degree to which that article supports the authors claim for “design of several new classes of biologically active compounds”.]

The PNP status of a compound is determined by how “NP library fragments” (these are cyclic substructures extracted from the chemical structures of compounds in an NP-focussed screening library that had been generated over a decade ago for fragment-based drug discovery) are combined in its chemical structure.
 
PNP_Status. Compounds were assigned to one of four categories according to their NP fragment combination graphs. (16,17) The NP library fragments used for this purpose are Murcko scaffolds (26) [It would be actually more appropriate to refer to these as ‘Bemis scaffolds’ in order to properly recognize the corresponding author of this article.] (the core structures containing all rings without substituents except for double bonds, n = 1673) derived (16) from a representative set of 2000 NP fragment clusters. (15) [I see this approach as unlikely to capture all the relevant cyclic substructures present in NPs.  My view is that it would have been better to first extract the relevant cyclic substructures from the chemical structures of all NPs for which this information is available, and then do the selection and filtering in one or more subsequent steps. The other advantage of doing things this way is that you’ll get a better assessment of the frequencies with which the different cyclic substructures occur in the chemical structures of NPs.]  Because of their ubiquitous appearances in NPs, the phenyl ring and glucose moieties were specifically excluded as fragments. (16) [I would expect exclusion of the benzene ring (I consider ‘benzene ring’ more correct than ‘phenyl ring’ in this context) as a fragment to result is a significant reduction in number of the number of compounds that are considered to be PNPs (and, by implication, the ‘enrichment’ associated with membership of the PNP class).  Even though the benzene ring has been excluded for the purpose of assigning PNP status it should still be considered to be one of Nature’s building blocks.]

As I mentioned earlier in the post, the view of NPs presented in H2024 is ‘scaffold-centric’ and I would question how realistic this view is given that non-scaffold atoms at the periphery of a molecular structure will generally be more exposed to targets (and anti-targets) than scaffold atoms at the core of the molecular structure. What I’m getting at here is that it is far from clear how much of a compound’s pharmacological activity can be attributed to the presence of individual substructural features in the chemical structure of the compound (modifying a point made in NoLE, I would argue that the contribution of a structural feature to the binding affinity of a compound is not actually an experimental observable). This is one reason that unless matched molecular pairs are available it would not generally be possible to demonstrate the superiority of one structural feature over another in an objective manner.

Something that you need to pay very close attention to when extracting substructures from chemical structures of compounds is the ‘environment’ of the substructure (I prefer to use the term ‘substructural context’). For example, two piperidine rings linked through nitrogen look very different from the perspective of a therapeutic target protein depending on whether the link is a carbonyl carbon or a tetrahedral carbon (most medicinal chemists will be aware that the protonation states differ but there are also subtle, although still significant, differences in the shape of the piperidine ring in the two substructures). You also need to be aware that fusing rings can have profound effects on physicochemical characteristics and I would consider it a bad idea to extract monocyclic substructures from fused or bicyclic ring systems.

There are some things that don't look quite right and I would have flagged these up if I’d been reviewing the manuscript. Let’s take a look at the first entry (Sotorasib) in Table 1 and you can see that the oxygen of the 2-pyrimidone substructure is coloured lilac indicating that this substructure can be found in the chemical structures of one or more NPs (I would still challenge the view that the result of fusing 2-pyrimidone with pyridine should be considered 'natural' on the basis that the heterocycles from which it is derived from are both found in chemical structures of NPs). Now take a look the second entry (Dolutegravir) in Table 1 and you'll notice that the oxygen in the 4-pyridone substructure is not coloured green. This implies that 4-pyridone does not occur in the chemical structure of any NP and, in the absence of  information, I can only assume that it has been anointed as 'natural' because of its structural analogy with pyridine (while there is a nitrogen atom and five trigonal carbon atoms in each substructure the molecular recognition characteristics of the two substructures differ far too much for them to be regarded as equivalent from the perspective of assigning PNP status). Six of the substructures in Figure 5 appear to be in unstable tautomeric forms (first, fifth, ninth, twelfth entries in line 2 | seventh entry in line 3 | first entry in line 5).   

I'll conclude my review of  H2024 by commenting on claims made by the authors:

This is further evidence that the three NP metrics can be considered as independent measures of clinical compound quality. [I would consider the claim that any of these “NP metrics” can be considered as a measure of“clinical compound quality” to be wildly extravagant (the authors haven't even stated how "clinical compound quality" is defined yet they claim to be able to measure it). I would argue that compound quality cannot be meaningfully compared for clinical compounds that have been developed for different diseases or disorders. Describing a compound as 'clinical' implies that a large body of measured data has actually been generated for it and the authors of H2024 might find it instructive to ask themselves why they think a simple metric calculated from the chemical structure of the compound would be of interest to a project team with access to this large of body of measured data One criticism that I make of drug discovery metrics is that they trivialize drug discovery and we noted in KM2013: “Given that drug discovery would appear to be anything but simple, the simplicity of a drug-likeness model could actually be taken as evidence for its irrelevance to drug discovery.” ]

The overall results are supportive of the occurrence of “natural selection” being associated with many successful drug discovery campaigns. [My view is that the authors of H2024 have not clearly articulated what they mean by“natural selection” in the context of this study.]  It has been proposed that NP-likeness assists drug distribution by membrane transporters, (21) [The author of (20c) asserts "Over the years, my colleagues and I have come to realise that the likelihood of pharmaceutical drugs being able to diffuse through whatever unhindered phospholipid bilayer may exist in intact biological membranes in vivo is vanishingly low" and, by implication, that entry of the vast majority of drugs into cells is transporter mediated. I keep an open mind on this issue although I note that what is touted by some as a universal phenomenon does seem to have been remarkably difficult to observe directly by experiment. The difficulties caused by active efflux are widely recognized by drug discovery scientists and it may be instructive for the authors of H2024 to consider how an experienced medicinal chemist working in the CNS area might view a suggestion that compounds should be made more like NPs to increase the likelihood of being transporter substrates.] and we further speculate that employing NP fragments may result in less attrition due to toxicity, a major cause of preclinical failure. (55[This does seem to be grasping at straws. The focus of the cited article is actually clinical failure and not preclinical failure.]

There is untapped potential for further exploitation of currently used and unused NP fragments, especially in fragment combinations and the design of PNPs, without the need to resort to chemically diverse ring systems and scaffolds. [This exemplifies what can be called the ‘Ro5 mentality’ (‘experts’ advising medicinal chemists to not explore but to focus on regions of chemical space that have been blessed by the ‘experts’). As I note in this blog post Ro5 (as it is stated) is not actually supported by data and in NoLE, I advise drug designers not to “automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant to the specific drug design projects on which they are working.” An equally plausible 'explanation' for the observation that a high fraction of clinical compounds are PNPs is simply that medicinal chemists are working with what they're most familiar with (in this case the advice would be to look beyond Nature's building blocks for inspiration).] To exploit these opportunities, “NP awareness” needs to be added to the repertoire of medicinal chemists. [My view is that it would be more important for critical thinking to be added to the repertoire of medicinal chemists so they are better equipped to assess the extent to which conclusions and recommendations of studies like H2024 are actually supported by data.]

In short, applying nature’s building blocks─natural intelligence─to drug design can enhance the opportunities now offered by artificial intelligence. [In my view "natural intelligence" appears to be arm-waving that is neither natural nor intelligent.]  

This is a good point to wrap up and to also conclude blogging for the year. My new year wish is for a kinder, happier and more peaceful World in 2025 and I'll leave you with a photo of BB and Coco in the study here in Maraval. They had been helping me with this post before I unwisely decided to explain ligand efficiency to them. Let sleeping dogs lie I guess.