Tuesday, 5 August 2025

Return to Flatland

Whoever first referred to Economics as ‘The Dismal Science’ had clearly never read an article on ‘3Dness’ in drug discovery.  My own experience reading articles on this topic is a sensation of having my life force slowly sucked out (I even suggested that reviewing the '3Dness' literature might be considered as an appropriate penance when I recently confessed my sins at St Gallen Cathedral) and the subject of Confession reminds me of a song that the late great Tom Lehrer sang about the Second Vatican Council.

In this post I review the CNM2025 study (Return to Flatland) which examines the heavily-cited LBH2009 study (Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success). The CNM2025 study, which has already been reviewed by Dan and Ash, opens with: 

The year is 2009, Barack Obama has just been inaugurated and both Lady Gaga and The Black Eyed Peas are at the height of their popularity

This couldn’t help but remind me of the “WORLD WAR 2 BOMBER FOUND ON MOON” headline that appeared on the front page of the Sunday Sport twenty-one years before the publication of LBH2009 (it was was accompanied by a photo of a B-17 in a lunar crater). A few weeks later the headline was “WORLD WAR 2 BOMBER FOUND ON MOON VANISHES” (this time accompanied by a photo of the now empty lunar crater).

I’ll start my review of CNM2025 by quoting from it and, as is usual for posts here at Molecular Design, quoted text is indented with any comments by me italicized in red and enclosed in square brackets. 

The hypothesis was attractive, and the data clearly showed the relationship between Fsp3 and clinical progression with pairwise significance P < 0.001. [This statement is inaccurate and Figure 3 of the LBH2009 study shows statistically significant differences at this level between (a) discovery and phase 2 compounds (b) phase 1 and phase 3 compounds (c) phase 2 compounds & drugs.  The authors of LBH2009 state: “The change in average Fsp3 was statistically significant between adjacent stages in only one case (phase 1 to phase 2)” but they neither show this in Figure 3 of their article nor do they report a P-value for the statistical significance of the mean difference in Fsp3 between phase 1 and phase 2 compounds.] The statistics seemed compelling, though the effect size was modest — an increase in average Fsp3 of 0.09 between sets of phase I and approved drugs equates to a difference of around two additional sp3 carbons per drug molecule only. [The authors of LBH2009 did not actually report this difference to be statistically significant so it is unclear why the authors of CNM2025 have stated that the “statistics seemed compelling”.]

The LBH2009 study is effectively a call to think beyond aromatic rings in drug design and my view is that there are considerable benefits in doing so even though I consider the data analysis in the study to be shaky. Almost three decades ago I included a quinuclidine in the Zeneca fragment library for NMR screening and later at AstraZeneca I would actively search (with minimal success) for amides and N-substituted heterocycles derived from bicyclic amines. I see the advantages in looking beyond aromatic rings as stemming primarily from increased molecular diversity and higher resolution coverage of chemical space, and in KM2013 we wrote:

Molecular recognition considerations suggest a focus on achieving axial substitution in saturated rings with minimal steric footprint, for example by exploiting the anomeric effect or by substituting N-acylated cyclic amines at C2.

Although data analyses (for example, see HY2010) presented in support of the belief that aromatic rings adversely affect aqueous solubility are typically underwhelming I consider the suggestion to be plausible and suggested in K2022 that deleterious effects of aromatic rings are more likely to be due to their potential for making molecular interactions than to their planarity. That said, I should also point out that the analysis of the relationship between aqueous solubility and Fsp3 presented in Figure 5 of LBH2009 is a textbook example of correlation inflation (see Fig. 5 in KM2013) and I suspect that if a team had submitted this analysis at Statistiques Sans Frontières the judges would have either awarded “nul points” or come to the conclusion that the team had played its joker. Given the Lady Gaga reference in CNM2025 I couldn't resist linking this Peter Gabriel song which includes the lyrics "Adolf builds a bonfire, Enrico plays with it" even though I have absolutely no idea what the the lyrics actually mean.

While the analysis of the relationship between aqueous solubility presented in Figure 5 of LBH2009 does endow the study with what I’ll politely call a whiff of the pasture it’s not directly related to the analysis of clinical progression presented in the study. Let’s take a look at Figure 3 in LBH2009 which shows mean Fsp3 values for compounds in discovery, at the three phases of clinical development, and approved drugs. As an aside this analysis would fall foul of current Journal of Medicinal Chemistry author guidelines (see link; accessed 05-Aug-2025) which clearly mandate that “If average values are reported from computational analysis, their variance must be documented”.  As mentioned earlier in this post Figure 3 in LBH2009 shows statistically significant (P value < 0.001) differences between (a) discovery and phase 2 compounds (b) phase 1 and phase 3 compounds (c) phase 2 compounds & drugs. It’s also worth stressing that Figure 3 in LBH2009 does not show statistically significant differences in Fsp3 for any of the clinical development transitions (phase 1 to phase 2; phase 2 to phase 3; phase 3 to approved drug).

I think that there are some problems with how the authors of the LBH2009 study have analysed the relationship between Fsp3 and progression through the stages of clinical development.  If charged with analysing this data I would focus on the three clinical development transitions (phase 1 to phase 2; phase 2 to phase 3; phase 3 to approved drug) and wouldn’t waste time on comparisons between discovery compounds and clinical compounds. If analysing the relationship between Fsp3 and the progression from phase 1 to phase 2, I would partition the set of phase 1 compounds into a ‘YES’ subset of compounds that had progressed to phase 2 and a ‘NO’ subset of compounds that had not progressed to phase 2. I would certainly be taking a close  look at distributions of Fsp3 values (some approaches to assessing statistical significance are based on the assumption of Normally-distributed data values) and I’d also be thinking about assessing effect size in addition to statistical significance. However, the problems with the LBH2009 analysis are more fundamental than non-Normal distributions of Fsp3 values.

The authors of LBH2009 assess the progression from phase 1 to phase 2 by comparing the mean Fsp3 value for the phase 1 compounds with the mean Fsp3 value for phase 2 compounds. The problem is that the Fsp3 values for the YES compounds (that have progressed from phase 1 to phase 2) are present in both the data sets for which comparisons are being made. This means that the observed differences in mean Fsp3 values will reflect both the difference between YES and NO compounds (relevant to relationship between Fsp3 and progression from phase 1 to phase 2) and the relative numbers of YES and NO compounds in the phase 1 data (not relevant to relationship between Fsp3 and progression from phase 1 to phase 2). Analysing the data in the way that the authors of LBH2009 have done effectively adds noise to the signal and it’s possible that they would have observed more statistically significant differences in mean Fsp3 values had they analysed the data in a more appropriate manner.

This is an appropriate point at which to discuss correlation in the context of studies such as LBH2009 and CNM2025. It’s actually well known (see L2013) that that Fsp3 values for chemical structures tend to be greater when amine nitrogen atoms are present (this does not invalidate the observed trends in the data but has big implications for how you interpret these trends). There is, however, a much bigger issue which is that correlation does not imply causation. Let’s suppose that you’ve just joined a drug discovery team as they are preparing to select a clinical candidate (I concede that this is most improbable scenario but it does illustrate a point). The team have an excellent understanding of the structure-activity relationship (SAR) and have successfully addressed a number of issues during the lead optimization process (the chemical structures of the compounds have been quite literally shaped by the problems that the team members have solved). Now consider the likely reaction of the team members to a suggestion that probability of success in the clinic would increase if the chemical structure of the best compound were modified so as to increase its Fsp3 value. My view is that the team might think that the person making such a suggestion had just stepped off the shuttle from Planet Tharg (an alien from this planet used to make occasional Sunday Sport  appearances). I see the trends in data observed by the authors of LBH2009 as effects rather than causes (the vanishing B-17 was never there in the first place).

Let’s return to the CNM2025 study and its authors state:

Using data from the Cortellis Drug Discovery Intelligence database, we repeated an analysis similar to that of Lovering et al. to assess Fsp3 in drugs approved post-2009 and those in active clinical development as of mid-2024 (Fig. 1). [I would challenge the claim that the analysis presented in CNM2025 is similar to that presented in LBH2009.] Although our methods used contemporary data sources different to Lovering et al., we obtained comparable Fsp3 data for approved drugs prior to 2009. More recently however, the picture appears to have changed with approvals shifting to lower Fsp3 drugs (Fig. 1a). Similarly, when looking at drugs currently in clinical development (Fig. 1b), there appeared to be no clear relationship between highest phase reached and Fsp3, suggesting the key conclusion noted by Lovering et al. has not persisted. In all data sets, exemplars with Fsp3 = 0 as well as Fsp3 = 1 are extensively seen.

Fig. 1a in CNM2025 shows the time-dependence of Fsp3 distributions for approved drugs according to approval date and I remain unconvinced of the value of analysis like this (on first encountering analysis of time-dependence of drug properties a quarter of a century ago I recall being left with the distinct impression that some senior medicinal chemists where I worked had a bit too much time on their hands). However, it is immaterial whether or not you are as underwhelmed as I am by time-dependence of drug properties because no such analysis is actually reported in LBH2009 and this is one reason that I challenge the claim by made by the authors of CNM2025 that they “repeated an analysis similar to that of Lovering et al. to assess Fsp3 in drugs approved post-2009 and those in active clinical development as of mid-2024”.  

Now let’s take a look at Fig. 1b in CNM2025 and this should be compared with Figure 3 in LBH2009. In some ways the former is an improvement on the latter since the violin plots show the distributions of Fsp3 values for each group of compounds and, as mentioned earlier in the post, I don’t think that it makes any sense to include discovery compounds in analysis like this (as the authors of LBH2009 did). Although these two figures look superficially similar they are actually very different and, given that the authors of CNM2025 only include compounds in development during 2024, I would argue that their study does not examine the link between Fsp3 and clinical progression. I agree that the difference between mean Fsp3 values for drugs approved up to 2009 and for drugs approved after 2009 is statistically significant. What is not clear from the analysis summarized in Fig. 1b in CNM2025 is whether the lower Fsp3 values of drugs that were approved after 2009 reflect smaller increases in Fsp3 over the course of clinical development (the B-17 has disappeared from the lunar crater) or lower Fsp3 values for compounds entering clinical development (the B-17 is still in the lunar crater). I think it's possible to address this question but you would need to analyse the data a lot more carefully than the authors of CNM2025 appear to have done. For example, you might examine the time-dependencies of mean Fsp3 values for compounds evaluated in phase 1 and the corresponding mean Fsp3 values for compounds that progressed or failed to progress to phase 2. While I consider more careful analysis of progression to be feasible I see little or no value from the perspective of real world drug discovery in actually performing the analysis more carefully.

This is a good point at which to wrap up. I see variation in drug properties with time as an effect rather than a cause and Forrest Gump would have been well aware of this fifteen years before the publication of LBH2019 when he famously observed that "shit happens". One point on which the CNM2025 authors and I do appear to agree is that there is not currently a B-17 in a lunar crater. Where we appear to differ is that they seem to be suggesting this was because it has vanished while I never believed that it was ever there in the first place. I’ll let the late great Dave Allen have the last word.