Wednesday, 31 December 2025

Hit to Lead best practice?

I'm now in Trinidad and I'll share a 180° panorama from Paramin where I walk for exercise. This district in Trinidad's Northern Range is renowned for its agriculture and the most excellent produce is grown in 'gardens' on steep hillsides. My walk would take about two and a quarter hours if I just walked but it usually takes rather longer because I like to take photos and often stop on the ridge to gaze at corbeaux 'surfing' the updrafts. Most of all I enjoy catching up with friends in Paramin and not so long ago one of them was telling me about the sound made by douens (which have terrified me since childhood because I was never baptised). Some years ago I was struggling along the ridge with a hacking cough that I'd brought with me from the UK three days previously when I heard a familiar voice (one of my friends was visiting his sister). The conversation turned to my cough and he instructed his sister to bring some medicine. She produced a bottle of a liquid that looked like fluorescein and, as she decanted some into a shot glass my friend exclaimed "dat too much yuh go kill him". The liquid appeared to have a puncheon base and my friend's sister also gave me some bush to make tea. My cough was history after three days.             


I’ll be taking a look at The European Federation for Medicinal Chemistry and Chemical Biology (EFMC) Best Practice Initiative: Hit to Lead (Q2025) in this post. I have a number of criticisms of this work and it really shouldn’t need saying that you do raise the bar for yourself when you present your work as defining best practices. As is customary for blog posts here at Molecular Design I’ve used Q2025 reference numbers when referring to literature studies and quoted text is indented with my comments in red italics. This will be a long tedious post and strong coffee is recommended.

Best practices are, in essence, recommended ways of doing things and it’s actually very difficult to demonstrate objectively that one way of doing things is better (or worse) than another way. My general view of Q2025 is of a poorly organized article that at times lacks clarity and coherence. Some of the advice offered on how best to do Hit to Lead (H2L) work is unsound and the Authors also make a number of significant errors. Although the abstract refers to “contemporary drug discovery” the recommended best practices do, in my view, appear to be firmly rooted in the past given that that fragment-based design (FBD) is not covered and there is no mention of important 'new' modalities such as irreversible covalent inhibition and targeted protein degradation. It’s worth mentioning that biological activity for some new modalities cannot be meaningful quantified as a single parameter such as an IC50 value and this complicates the use of ligand efficiency metrics (a post on covalent ligand efficiency will give you an idea of some of the difficulties) which the Authors seem to consider important in H2L work. I consider the quantity of literature cited in  Q2025 to be excessive, especially given that some of the cited articles have minimal relevance to H2L work (the failure of the Authors to cite R2009 is also noteworthy). In some cases the cited literature does not support assertions made by the Authors. In my view Figures 1, 5 and 8 are redundant.

While I see plenty wrong with Q2025 it’s worth flagging up points on which the Authors and I appear to be in agreement. I think that they put it well with the following statement: My view is 

Leads have line of sight to a development candidate and bring an understanding of what priorities Lead Optimisation should address.

I used this football analogy in an earlier post:

The screening phase is followed by the hit-to-lead phase and it can be helpful to draw an analogy between drug discovery and what is called football outside the USA. It’s not generally possible to design a drug from screening output alone and to attempt to do so would be the equivalent of taking a shot at goal from the centre spot. Just as the midfielders try move the ball closer to the opposition goal, the hit-to-lead team use the screening hits as starting points for design of higher affinity compounds. The main objective in the hit-to-lead phase is to generate information that can be used for design and mapping structure-activity relationships for the more interesting hits is a common activity in hit-to-lead work.

I certainly agree that it is important to establish structure-activity relationships (SARs) for structural series of interest although I have no idea what the Authors mean by “dynamic SAR”. I also agree that consideration of physicochemical properties, especially lipophilicity, is very important in H2L work (just as it is in optimisation of the leads) although the case for a Nobel Prize made in a 2024 JMC Editorial does, in my view, appear to have been overcooked.

I argue that drug discovery should be seen in a Design of Experiments framework (generate the information that you need as efficiently as possible) rather than as the prediction exercise that many who tout machine learning (ML) as a panacea for the ills of Pharma/Biotech would have you believe. Regardless of which view prevails it’s abundantly clear that generation and analysis of data are very important in contemporary drug discovery and are likely to become even more important in the future). However, if you’re going to base decisions on trends in data then it’s important that you know how strong the trends are because this tells you how much weight to give to the trends when making your decisions. Most drug discovery scientists will have encountered analyses of relationships between predictors of ADME (absorption, distribution, metabolism, and excretion) and physicochemical and chemical structure descriptors and we observed in the KM2013 perspective that:

The wide acceptance of Ro5 provided other researchers with an incentive to publish analyses of their own data and those who have followed the drug discovery literature over the last decade or so will have become aware of a publication genre that can be described as ‘retrospective data analysis of large proprietary data sets’ or, more succinctly, as ‘Ro5 envy’.

In some cases trends observed in data are presented in ways that make them appear to be stronger than they actually are (this is typically achieved by categorizing continuous-valued data prior to analysis) and [13a], [24] and [26] were criticised in this context in KM2013. When reading articles on drug-likeness and compound quality it is also important to be aware that correlation does not imply causation.  One should be particularly wary of of studies such as [20c] which present analyses of proprietary data as "facts" or claim that such analyses have revealed "principles". I see the weakness of these trends partly as a reflection of chemical structure diversity in datasets and would expect the corresponding trends to be stronger within structural series (I offer the following advice in NoLE):

Drug designers should not automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant to the specific drug design projects on which they are working.

I see erosion of critical thinking skills as a significant problem in contemporary drug discovery and some leaders in the field appear to have lost the ability to distinguish what they know from what they believe. As I observed in a review of a 2024 JMC Editorial (Property-Based Drug Design Merits a Nobel Prize) the Rule of 5 (Ro5) is not actually supported by data in the form that it was stated. The wide acceptance of Ro5 as a definition of drug-likeness propagates what I consider to be a misleading view that drugs occupy a contiguous and distinct region of chemical space. Some of the claims made in the JMC Editorial (“a compound is more likely to be clinically developable when LipE > 5”, “a discovery compound is more likely to become a drug when Fsp3 > 0.40” and “a compound is more likely to have good developability when PFI < 7”) do not appear to be based on data. I remain sceptical that developability and likelihood of clinical success of a compound can be meaningfully assessed even when one knows that the compound actually exhibits activity against the target(s) of interest. In my view the suggestion that simple drug discovery guidelines are worthy of a Nobel Prize does a huge disservice to drug discovery scientists by trivializing the very significant challenges that they face.   

Like many in the drug discovery field, I consider lipophilicity to be the single most important physicochemical property in drug discovery and I would generally anticipate that a surfeit of lipophilicity will end in tears. That said, I don't consider lipophilicity to be usefully predictive of physicochemical properties such as permeability and aqueous solubility that are more relevant than lipophilicity from the perspective of oral absorption. When I assert that lipophilicity is not "usefully predictive" I'm certainly not denying that trends in data exist. The problem is that the trends are not so strong that having solubility value that has been predicted from lipophilicity means that you no longer need to measure aqueous solubility.    

In drug discovery projects I generally recommend examination of the response of potency (expressed as a logarithm) to increased lipophilicity. In the ideal situation the correlation of potency with lipophilicity will be weak, indicating that potency is driven by factors other than lipophilicity. If the correlation of potency with lipophilicity is strong then you need the response (the slope for a linear correlation) to be relatively steep. I consider it to be generally helpful to plot potency against lipophilicity with reference lines corresponding to different LipE values (see R2009 which is a lot more relevant to H2L work than much of the literature cited in the Q2025 study) and I would also suggest modelling the response and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the "Alternatives to ligand efficiency for normalization of affinity" section of NoLE).

In drug discovery lipophilicity is usually quantified by the logarithm of the octanol/water partition coefficient (log P) or distribution coefficient (log D). The choice of octanol/water for quantification of lipophilicity is arbitrary and some, including me, consider saturated hydrocarbons such as cyclohexane or hexadecane to be physically more realistic than octanol as a model for the core of a lipid bilayer. It is the distribution coefficient (D) rather than the partition coefficient (P) that is measured for lipophilicity assessment although the two quantities are equivalent when ionization can be safely neglected. Values of logP for ionizable compounds can be derived from the response of log D to pH although this is not generally done routinely in in drug discovery. Alternatively, you can make the assumption that only neutral forms of compounds partition into the organic phase and use (1) in the H2L best practices post graphic (see also K2013) to convert log D values to log P values (to do this you’ll also need a reliable estimate for pKa in order to calculate the neutral fraction). When log D (as opposed to log P) is used to assess the ‘quality’ of compounds you can make compounds better simply by increasing the extent to which they are ionized and I hope you can see that going down this path is likely to end as well as things did for the Sixth Army at Stalingrad.


In drug discovery log P values are typically calculated and it can often be quite difficult when reading the literature to know which method has been used for the calculations (sometimes the term ‘cLogP’ appears to have been used merely to denote that log P values have been calculated).  For example, it is stated in [13a] that “Physical property data were obtained from AstraZeneca’s C-Lab tool, incorporating standard packages for LogP calculations (cLogP, ACDLogP), and an in-house algorithm for the distribution coefficient (1-octanol–water LogD at pH 7.4)”. In general, different prediction methods will give different log P values for the same compound (for example the Ro5 lipophilicity threshold is 5 when ClogP is used but 4.15 when MlogP is used). That said, choice of method for predicting log P and whether you use measured log D or predicted log P become less important issues when working within structural series because hydrogen bond donors and acceptors, and ionizable groups tend to be relatively conserved under this scenario.

That log D and log P are different quantities in the context of drug design is one of a number of things that the Authors of [34a] (Molecular Property Design: Does Everyone Get It?) just don’t seem to ‘get’ and I’ll point you toward a blog post in which this point is discussed in a bit more detail. Let’s examine Figure 2 (Impact of hydrophobicity on developability assays and the profile of marketed oral drugs) of [34a] and I’d like you take a look at the upper panel (a). You’ll notice that the visualization for some of the ‘developability’ assays is based on PFI (derived from log D measured chromatographically at pH 7.4). However, the visualization for hERG (+1 charge) and promiscuity is based on iPFI (derived from ‘Chrom logP’ and it is not clear how this quantity was defined or generated). I would also argue that the activity criterion (pIC50 > 5) used in the promiscuity analysis is too permissive to be physiologically relevant (this is a common issue in the promiscuity literature). As an aside, I am unconvinced that log D values were actually measured chromatographically at pH 7.4 for all the drugs that form the basis of the analysis shown in the lower panel (b) of Figure 2.        

After a long preamble it’s time to start my review of Q2025 and comments will follow the order of the article. I see the citation of [2] and [3] as gratuitous while [4] does not appear to present evidence in support of the claim that “ensuring high quality of lead series is a large cost and time saver in the overall process of drug discovery” (it must be stressed that I certainly don’t deny the value of high quality lead series and am merely pointing out that the chosen reference does not actually demonstrate that higher quality of lead series results in cost and time savings in drug discovery).

In my view neither Figure 1 nor its caption (see below) makes any sense.

Figure 1. Illustration of the multi-objective characterisation necessary in the journey from a hit to a drug. All these necessary characteristics, described by illustrative principal components, are influenced by the physicochemical properties of the molecules.

You’ll frequently encounter graphics like Figure 1 that show low-dimensional chemical spaces in the drug discovery literature (for example, a 2-dimemsional space might be specified in terms of lipophilicity and molecular size). While it’s very easy to generate graphics like these the relevance of the chemical spaces to drug design is often unclear. There are ways in which you can demonstrate the relevance of a chemical space to drug design and, for example, you might build usefully predictive models for quantities such as IC50, aqueous solubility or permeability using only the dimensions of the particular chemical space as descriptors. Alternatively, you could show that compounds in mutually exclusive categories such as ‘progressed to phase 2’ and ‘failed to progress to phase 2’ occupy different regions of the chemical space (note that it’s not sufficient to show that a single class of compounds such as ‘approved drugs’ occupies a particular region within the chemical space and this is the essence of a general criticism that I make of Ro5 and QED). It is common to depict the different categories as ellipses that enclose a given fraction of the data points corresponding to each category and the orientation of each ellipse with respect to the axes indicates the degree to which the descriptors that define the chemical space are correlated for each category. One problem with Figure 1 is that the meaning of the ellipses is unclear and I would challenge the assertion made by the Authors that “the journey of a drug discovery campaign is characterized in Figure 1, showing how the active hit needs to be modified to address the requirements impacting the efficacy and safety of the molecule”.

Potency optimisation alone is not a viable strategy towards the discovery of efficacious and safe drugs, or even high-quality leads. Concurrent optimisation of the physicochemical properties of a molecule is the most important facet of drug discovery, as these properties influence its behaviours, disposition and efficacy [12a | 12b]. [While I certainly agree that there is a lot more to drug design than maximisation of potency I would argue that controlling exposure is a more important objective than optimization of physicochemical properties. It's also worth bearing in mind that you can't compensate for inadequate potency with increased compound quality. I don't consider either reference as evidence that "concurrent optimisation of the physicochemical properties of a molecule is the most important facet of drug discovery" and it is not accurate to describe metabolic stability, active efflux and  affinity for anti-targets as "physicochemical properties".  I think the Authors need to say more about which physicochemical properties they recommend to be optimized and be clearer about exactly what constitutes optimization. Lipophilicity alone is not usefully predictive of properties such as bioavailability, distribution and clearance that determine the effects of drugs in vivo.] Together these outcomes define the quality of the molecule, indicative of its chances of success in the clinic, as evidenced in numerous studies [13a | 13b]. [Neither of these articles appears to provide convincing evidence of a causal relationship between “the quality of a molecule” and probability of success in the clinic.  Much of the 'analysis' in [13a] consists of plots of median values without any indication of the spreads in the corresponding distributions. As explained in KM2013 presenting data in this manner exaggerates trends and I consider it unwise to base decisions on data that have been presented in this manner. Quite aside from from the issue of hidden variation I do not consider the relationship between promiscuity and median cLogP reported (Figure 3a) in [13a] to be indicative of probability of success in the clinic, given that the criterion for 'activity' ( > 30% inhibition at 10 µM) is far too permissive to be physiologically relevant (this is a common issue in the promiscuity literature).]

While the optimal lipophilicity range has been suggested as a log D7.4 between 1 and 3, [15] this is highly dependent on the chemical series. [The focus of the analysis was permeability and the range was actually defined in terms of AZlogD (calculated using proprietary in-house software) as opposed to log D measured at 7.4. The correlation between the logarithm of the A to B permeability and AZlogD is actually very weak (r2 = 0.16) which would imply a high degree of uncertainty in threshold values used to specify the optimal lipophilicity range. While I remain sceptical about the feasibility of meaningfully defining optimal property ranges the assertion that the proposed range in AZlogD of 1 to 3 “is highly dependent on the chemical series” is pure speculation and is not based on data.] Best practice would be to generate data for a diverse set of compounds in a series, if measuring it for all analogues is not possible, and determine the lipophilicity range that leads to the most balanced properties and potency [3 | 16]. [It is not clear what the Authors mean by “most balanced properties and potency” nor is it clear how one is actually supposed to use lipophilicity measurements to objectively “determine the lipophilicity range that leads to the most balanced properties and potency”. My view is that to demonstrate "balanced properties and potency" would requite measurements of properties such as aqueous solubility and permeability that are more predictive than lipophilicity of exposure in vivo. I do not consider either [3] or [16] to support the assertions being made by the Authors.]  Lipophilicity and pKa prediction models can then guide further designs and synthesis of analogues along the optimisation pathway (Figure 3 [17]). but measurements are advised, particularly by chromatographic methods, such as Chrom log D7.4, in [18] contemporary practice. [In general, it is very difficult to convincingly demonstrate that one measure of lipophilicity is superior to another. Chromatographic measurement of log D is is higher in throughput than the shake flask method used traditionally but it is unclear as to which solvent system the measurement corresponds.  Furthermore, the high surface area to volume area of the stationary phase means that an ionized species can interact to a significant extent with the non-polar stationary phase while keeping the ionized group in contact with the polar stationary phase and one should anticipate that the contribution of ionization to log D values might be lower in magnitude than for a shake flask measurement.]

As noted earlier in the post I consider it helpful to plot (as is done in Figure 3 which also serves as the graphical abstract) potency against lipophilicity with reference lines corresponding to different LLE (LipE) values (see R2009 which really should have been cited) to be a good way for H2L project teams to visualize potency measurements for their project compounds. That said, I consider view of the discovery process implied by Figure 3 to be neither accurate nor of any practical value for scientists working on H2L projects. It is relatively easy to define optimization of potency and measurements in an vitro assay are typically relevant to target engagement in vivo (uncertainty in the concentration of the drug in the target compartment, and of the species with which it competes, is likely to be the bigger issue when trying to understand why in vitro potency fails to translate to beneficial effects in vivo). One specific criticism that I will make of the Figure 3 is that it appears to imply that it doesn't matter whether you use log P or log D (when you use log D you can reduce lipophilicity to acceptable levels simply by increasing the extent to which compounds are ionized).    

However, there is quite a bit more to optimization of properties such as permeability, aqueous solubility, metabolic stability and pharmacological promiscuity that are believed to be predictive of ADME and toxicity, and my view is that defining optimization in terms of determining "the lipophilicity range that leads to the most balanced properties and potency" to be hopelessly naive. The principal objective in H2L work (and in lead optimization) is to identify compounds for which potency and properties related to ADME and toxicity are all acceptable. Defining meaningful acceptability criteria is non-trivial and H2L teams also typically need to make decisions as to how criteria can be relaxed with a minimum of risk. It's important to be aware that you can't compensate for inadequate potency by making the other properties better and those who argue that drug discovery scientists should focus on lipophilic efficiency rather than potency are missing this point.

While plotting potency against lipophilicity with reference lines corresponding to different LLE (LipE) values is often a helpful way to visualise project data in H2L (and in lead optimization) I don't consider Figure 3 to provide an accurate or useful view of the typical H2L process. Figure 3 presents a view that a hit maps to a lead which in turn maps to a drug candidate. In reality the screening phase of a discovery project will identify multiple hits and the resulting leads are not single compounds but structural series. It is important to be aware that the practical (as opposed to conceptual) utility of a graphic such as Figure 3 is limited by the extent to which the chosen measure of lipophilicity is predictive of properties such as aqueous solubility, permeability and metabolic stability.  

Although Q2025 claims to define H2L best practices the Authors don't appear to demonstrate much awareness of the nature of the H2L process. The first step in the H2L process is to follow up hits from the initial screen by assaying potential compounds of interest (summarised in Figure 2) although and in some cases some follow up might have already been done in the hit generation phase. Hits tend to group into structural families and the H2L chemists then synthesise compounds (in some organizations synthesis is outsourced) with a view to identifying compounds that are more potent that the hits. Decisions as to which compounds are to be made are typically hypothesis-based (see P2012) although in some cases genuinely predictive models might be available to the H2L team. Design hypotheses are typically based on information available to H2L teams, such as SARs derived from the hits or relevant target structures, and predictive models might be based on free energy calculations (see ASC2025). As the H2L teams generate more information design hypotheses become more specific and models based on project data become more predictive.

I would argue that establishing (and exploiting) SARs and structure-property relationships (SPRs) constitutes a basis for design in H2L work.  Certain features of SARs are especially relevant to H2L work and an observation that a reduction in log P leads to increased potency (or at least a minimal decrease in potency) is information that project teams can make good use of. Other SAR features that I would advise H2L scientists to look for are activity cliffs (relatively small changes in structure result in relatively large changes in potency) and superadditivity (effect on potency of simultaneously making two structural modifications is greater than what would be expected from the effects of making each structural modification individually).  

I see managing the 'assay budget' as a critical activity especially when running assays is an outsourced activity. For example, differences in lipophilicity between structurally related compounds are typically easy to predict and measuring log D values for every project compound is likely to be wasteful of resources. H2L teams need to use their assay budget to identify and address issues efficiently and I don't consider the suggestion that H2L teams use a generic tiering approach such as the one shown in Figure 9 to be helpful. Something that I do suggest H2L teams consider is that they try to assess the response of properties such as aqueous solubility and permeability to lipophilicity (this means making measurements for less potent compounds).                     

Figure 3. There are numerous routes to climb a mountain, as there are to discover a drug, but a measured approach to lipophilicity will guide an optimal path, [The Authors need to articulate what they mean by “a measured approach to lipophilicity” (which does come across as arm-waving) and provide evidence to support their claim that it “will guide an optimal path”.] where the outcome is usually driven by a balance of activity and lipophilicity [This appears to be a statement of belief and the Authors do need to provide evidence to support their claim. The Authors also need to say more about how the “balance of activity and lipophilicity” can be objectively assessed.] (The parallel lines represent LLE, i.e. pIC50 - log P). [This way of visualizing data was introduced in the R2009 study which, in my view, should have been cited.]

Thus the Distribution Coefficient, (log D at a given pH) is a highly influential physical property governing ADMET profiles [20a | 20b | 20c] such as on- and off-target potency, solubility, permeability, metabolism and plasma protein binding (Figure 4) [14b]. [I recommend that the term ‘ADMET’ not be used in drug discovery because ADME (Absorption, Distribution, Metabolism, and Excretion) and T (Toxicity) are completely different issues that need to be addressed differently in design. I would argue that the ADME profile of a drug is actually defined by its in vivo characteristics such as fraction absorbed (which may vary with dose and formulation), volume of distribution and clearance (the Authors appear to be confusing ADME with in vitro predictors of ADME) and I would also argue that toxicity is an in vivo phenomenon. In order to support the claim that log D “is a highly influential physical property governing ADMET profiles” it would be necessary to show that log D is usefully predictive of what happens to drugs in vivo. My view is that the cited literature does not support the claim that the claim that log D “is a highly influential physical property governing ADMET profiles” given that  [20a] does not even mention log D and neither [20b] nor [20c] provides any evidence that log D is usefully predictive of in vivo behaviour of drugs.]

Figure 4. The impact of increasing lipophilicity on various developability outcomes [14b] [It is unclear as to whether lipophilicity is defined for this graphic in terms of log P or log D. It would be necessary to show more than just the ‘sense’ of trends for the term “impact” to be appropriate in this context. I do not consider the use of the term “developability outcomes” to be accurate.]

Aqueous solubility is certainly an important consideration in H2L work although I think that the Authors could have articulated the relevant physical chemistry rather more clearly than they have done. You can think of the process of dissolution as occurring in two steps (sublimation of the solid followed by transfer from the gas phase to water). Lipophilicity usually features in models for prediction of aqueous solubility although I consider wet octanol to be a thoroughly unconvincing model for the gas phase. We generally assume that aqueous solubility is limited by the solubility of the neutral form (which is why ionization tends to be beneficial) but when this assumption breaks down the solubility that you measure will depend on both the nature and concentration of the counter-ion. As I note in HBD3 optimization of intrinsic aqueous solubility that optimization of intrinsic aqueous solubility (the solubility of the neutral form of the compound) is still a valid objective for ionizable compounds because we're typically assuming that only neutral species can cross the cell membrane by passive permeation.

Some general advice that I would offer to drug discovery scientists encountering solubility issues is that they should try to think about molecular structures from the perspectives of molecular interactions in the solid state and crystal packing. I would expect the left hand 'Reduce crystal packing' structure in Figure 6 to be able to easily adopt a conformation in which the planes corresponding to the aromatic rings and amide are all mutually coplanar (this is a scenario in which a non-aromatic replacement for an aromatic ring might be expected to have a relatively large impact). In HBD3 I suggest that deleterious effects of aromatic rings on aqueous solubility might be due the molecular interactions of the aromatic rings rather than their planarity. I also suggest in HBD3 that elimination of non-essential hydrogen bond donors be considered as a tactic for improving aqueous solubility because it tends to increase the imbalance between hydrogen bond donors and acceptors while minimizing the resulting increase lipophilicity.      

Rational [this use of "rational" is tautological] reasons for poor solubility were succinctly described by Bergstrom, who coined "Brick Dust and Greaseballs" as two limiting phenomena in drug discovery [22] which are in line with the empirical findings that led to General Solubility Equation [23] (Figure 5). [I don’t consider the General Solubility Equation to have any relevance to H2L work because it has not been shown to be usefully predictive of aqueous solubility for compounds of interest to medicinal chemists and the inclusion of Figure 5, which merely shows how predicted solubility values map on to an arbitrary categorisation scheme, appears to be gratuitous.] Succinctly, three factors influence solubility: lipophilicity, solid state interactions and ionisation. [It is solvation energy as opposed to lipophilicity that influences solubility and wet octanol is a poor model for the gas phase.] Determining which are the strongest drivers of low solubility will guide the optimisation (Figure 6). Using the analysis in Figure 5 the Solubility Forecast Index emerged, using the principle that an aromatic ring is detrimental to solubility, roughly equivalent to an extra log unit of lipophilicity for each aromatic ring (Thus SFI = clog D7.4 + #Ar) [24]. [I consider the use of the term “principle” in this context to to be inaccurate given that that the basis for SFI is subjective interpretation of a graphic generated from proprietary aqueous solubility data and I direct readers to the criticism of SFI in KM2023.] Minimising aromatic ring count is an important and statistically significant metric to consider [25] [The importance of minimizing aromatic ring count is debatable and it is meaningless to describe metrics as “statistically significant”.] - consistent with the "escape from flatland" concept [26] that focusses on increasing the sp³ (versus sp²) ratio in molecules, [The focus in the “escape from flatland” study is actually on the fraction of carbon atoms that are sp3 (Fsp3) and not on “the sp³ (versus sp²) ratio”.] even though no significant trends are apparent in detailed analyses of sp³ fractions [27]. [The “analyses of sp³ fractions” in [27] consist of comparisons of drug - target medians for the periods 1939-1989, 1990-2009 and 2010-2020 and all appear to be statistically significant (although I don't consider these analyses to have any relevance to H2L work). I consider the citation of [27] in this context to be gratuitous and this blog post might be of interest.]

An important factor in hit selection is to prioritise compounds with higher ligand efficiency. Ligand efficiency, defined as activity [LE is actually defined in terms of Gibbs free energy of binding and not activity.] per heavy atom (LE=1.37 * pKi/Heavy Atom Count, Figure 7a), is commonly considered in discovery programmes as a quality metric [33]. [LE (Equation 3 in the H2L best practices post graphic) is actually defined as the Gibbs free energy of binding, ΔG° (Equation 2 in H2L best practices post graphic), divided by the number of non-hydrogen atoms, NnH (this is identical to heavy atom count although I consider the term to be less confusing), but the quantity is physically (and thermodynamically) meaningless because perception of efficiency varies with the arbitrary concentration, C°, that defines the standard state (see Table 1 in NoLE). Using a standard concentration enables us to calculate changes in free energy that result from changes in composition and, while the convention of  using C° = 1 M when reporting ΔG° values. is certainly useful, it would be no less (or more) correct to report ΔG° values for  C° = 1 µM. Put another way the widely held belief that 1 M is a 'privileged' standard concentration is thermodynamic nonsense (Equation 2 in the H2L best practices post graphic shows you how to interconvert ΔG° values between different standard concentrations). Given the serious deficiencies of LE as a drug design metric, I suggest modelling the response and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the 'Alternatives to ligand efficiency for normalization of affinity' section of NoLE). There are two errors in the expression that the Authors have used for LE (the molar energy units are missing and the expression is written in terms of Ki rather than KD). The factor of 1.37 in the expression for LE  comes from the conversion of affinity (or potency) to ΔG° at a temperature of 300 K, as recommended in [35], although biochemical assays are typically are typically run at human body temperature (310 K). My view is that it is pointless to include the factor of 1.37 given that this entails dropping the molar energy units and using a temperature other than that at which the assay was run. Dropping the factor of 1.37 would also bring LE into line with LLE (LipE).] Various analyses suggest that, on average, this value barely change over the course of an optimisation process [20b | 27 | 34a | 34b] - so it is important to consider maintenance of any figure during any early SAR studies. [I disagree with this recommendation. These analyses are completely meaningless because the variation of LE over the course of an optimization itself varies with the concentration unit in which affinity (or potency) is expressed (Table 1 of NoLE illustrates this for three ligands of that differ in molecular size and potency). In [34a] the start and finish values values of LE were averaged over the different optimizations without showing variance and it is therefore not accurate to state that the study supports the assertion that LE values "barely change over the course of an optimisation process".Lipophilic Ligand Efficiency (activity minus lipophilicity typically pKi -log P, Figure 7b), which is widely recognised as the key principle in successful drug optimisation, comes into play both for hit prioritization and optimisation. [LLE is a simple mathematical expression and I don’t consider it accurate to describe it as a “principle” let alone “the key principle in successful drug optimisation”. LLE can be thought of as quantifying the energetic cost of transferring a ligand from octanol to its target binding site although this interpretation is only valid when the ligand is predominantly neutral at physiological pH and binds in its neutral form. LLE is just one of a number of ways to normalize potency with respect to lipophilicity and I don't think that anybody has actually demonstrated that (pIC50 – log P) is any better (or worse) as a drug design principle than pIC50 – 0.9 × log P. When drug discovery scientists report that they have used LLE it often means that they have plotted their project data in a similar manner to Figure 3 as opposed to staring at a table of LLE values for their compounds. As an alternative to LLE (LipE) for normalization of affinity (or potency) with respect to lipophilicity I suggest modelling the response and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the 'Alternatives to ligand efficiency for normalization of affinity' section of NoLE).] Improving this value reflects producing potent compounds without adding excessive lipophilicity. Taken together, it has been shown that for any given target, the drugs mostly lie towards the leading "nose" [?] where LE and LLE are both towards higher values [20b | 35]. [This perhaps not the penetrating an insight that the Authors consider it to be, given that drugs are usually more potent than the leads and hits from which they have been derived.] However, setting aspirational targets for either metric is unwise, as analysis of outcomes indicates that the values are target dependant [20b]. [I consider target dependency to be a complete red herring in this context and a more important issue is that you can’t compensate for inadequate potency by reducing molecular size or lipophilicity.]  Focusing on increasing LLE to the maximum range possible and prioritizing series with higher average values is the recommended strategy [27 | 36]. [It is not clear what is meant by “increasing LLE to the maximum range possible” and I consider it very poor advice indeed to recommend “prioritizing series with higher average values” (my view is that you actually need to be comparing the compounds from different series that have a realistic chance of matching the desired lead profile. The Authors of Q2025 appear to be misrepresenting [36] given that the study does not actually recommend “prioritizing series with higher average values”. This blog post on [27] might be relevant.]

One can summarize this section with a simple but critical best practice: potency and properties (physicochemical and ADMET) have to be optimized in parallel (Figure 8) [37] to get to quality leads and later drug candidates with higher chances of clinical success. Whilst seemingly trivial, this proposition is rendered challenging by an "addiction to potency" and a constant reminder of this critical concept remains useful for medicinal chemists [38]. [My view is that many medicinal chemists had moved on from the addiction to potency when the molecular obesity article was published a decade and a half ago and I would question the article's relevance to contemporary H2L practice. The threshold values that define the GSK 4/400 rule actually come from an arbitrary scheme used to categorize the proprietary data analyzed in the G2008 study as opposed being derived from objective analysis of the data. The study reproduces the promiscuity analysis from [13a] which I criticised earlier in this post for exaggerating the strength of the trend and using an excessively permissive threshold for ‘activity’.] With poor properties, even "good ligands" may not fully answer pharmacological questions [39a | 39b]. [These two articles focus on chemical probes and I don’t consider either article have any relevance to H2L work.  Chemical probes need to be highly selective (more so than drugs) and permeable although solubility requirements are likely to be less stringent when using chemical probes to study intracellular phenomena than in H2L work and one generally does not need to worry about achieving oral bioavailability.] 

I agree that mapping SARs for structural series of interest is an important aspect of H2L work and activity cliffs (small modifications in structure resulting in large changes in activity) are of particular interest given the potential for to beating trends and achieving greater selectivity. Instances of decreased lipophilicity resulting in increased potency (or at least minimal loss of potency) should also be of significant interest to H2L teams. When mapping SARs it is important that structural transformations should change a single pharmacophore feature at a time and one should always consider potential ‘collateral effects’, such as perturbed conformational preferences, that might confound the analysis. Some of the structural transformations shown in Figure 10 change more than one pharmacophore feature at a time which makes it impossible to determine which pharmacophore feature is required for activity.    

Figure 10. Conceptual example of iterative SAR [The meaning of the term “iterative SAR” is unclear] to determine the pharmacophore. As each change may affect binding interactions, conformation and ionization state; complementary structural modification [The meaning of "complementary structural modification" is unclear] will be needed to understand the change in potency and determine the pharmacophore 

Is Nitrogen needed (e.g. HBA)? [In addition to eliminating the quinoline N hydrogen bond acceptor this structural transformation eliminates a potential pharmacophore feature (the amide carbonyl oxygen can function as a hydrogen bond acceptor) while creating a cationic centre which will incur a significant desolvation penalty.]

Is NH needed? [This structural transformation eliminates the amide NH but it also is unlikely to address the question of whether the NH is needed because the amide carbonyl has also been eliminated.]

Is carbonyl needed? [The elimination of the amide carbonyl oxygen (hydrogen bond acceptor) creates a cationic centre which will incur a desolvation penalty.] 

As a last proposition, [49a | 49b] we suggest that the progress in computational physicochemical and ADMET property predictions represents an opportunity to accelerate the optimisation of molecules with a "predict-first" mindset [4 | 50]. [I certainly agree that models should be used if they are available. However, the citation of literature does appear to be gratuitous and it is unclear why the Authors believe that scientists working on H2L projects will benefit from knowing that a proprietary system for automated molecular design has been developed at GSK.]  The first step is to generate sufficient data for a series to build confidence in [51] any models, which can then be exploited in the prioritization of compounds for synthesis that fit with aspirational profiles [My view is that it would be very unwise for H2L project teams to blindly use models without assessing how well the models predict project data and the citation of [51] appears to be gratuitous been cited. Typically, H2L project teams use measured data to move their projects forward and generating data purely for the purpose of model evaluation is likely to be a distraction. One piece of advice that I will offer to H2L project teams is that they attempt to characterise responses of ADME predictors, such as aqueous solubility and permeability, to lipophilicity (likely to involve measurements for less potent compounds).] This ensures higher physicochemical quality [I consider “ensures” to be an exaggeration and I would argue that “physicochemical quality” is that not something that can even be defined meaningfully or objectively (let alone quantified).], asks more pertinent questions and might reduce the total number of molecules made to get to the lead (Figure 11).

It's been a long post and I'll say a big thank you for staying with me until the end. I wrote this post primarily for students and early-career scientists although I hope the feedback will also be helpful for the EFMC.  One piece of advice that I will offer to all scientists regardless of the stage of their careers is to not switch off your critical thinking skills just because a study has been presented as defining best practices or is highly-cited. In particular, I urge students and early-career scientists to be extremely wary of studies in which the conclusions don't follow from the data and I'll share a recent blog post that illustrates the problem. All that said, however, confused thinking amongst drug discovery scientists is not high on the list of the problems facing many of the world's inhabitants right now and my wish for 2026 is a kinder, gentler and fairer world.  

Sunday, 30 November 2025

Covalent ligand efficiency

It appears that whoever first described Economics as the ‘Dismal Science’ had never encountered a ligand efficiency metric. I’ll be taking a look at the FK2025 study (Covalent ligand efficiency) in this post and the study has already been reviewed by Dan. Something that I’ve observed repeatedly over the years is that authors of ligand efficiency studies exhibit a lack of understanding of units and dimensions associated with physicochemical quantities that would shame a first year undergraduate studying introductory physical chemistry (this is somewhat ironic given that creators of ligand efficiency metrics frequently tout their creations in physicochemical terms). I consider covalent ligand efficiency (CLE) as defined in the FK2025 study to have no value whatsoever for design of drugs that bind irreversibly to their targets through covalent bond formation given that the metric is time-dependent and based on an invalid measure of bioactivity. The formidable Lady Bracknell is clearly unimpressed and I should mention that the photo is from the wikipedia page for English actress Rose Leclercq (1843-1899). Given the serious deficiencies in the FK2025 study this is going to be a long and tedious post (even more so than usual 😁😁😁) so please ensure that you have strong coffee close to hand. As is usually the case in posts here I've used the same reference numbers as were used in FK2025 and quoted text is indented with my comments in red italics. I've organized some of the mathematical material into three tables and references to tables in the post are to these (and not to any of the tables in the FK2025 study).    

Before starting the review of FK2025 it’s worth examining irreversible covalent inhibition from a molecular design perspective and I’ll direct readers to the informative S2016 and McW2021 reviews, and the recent L2025 study which presents COOKIE-Pro for covalent inhibitor binding kinetics profiling on the proteome scale. Covalent bond formation between RNA and ligands can also be exploited (S2025 | K2025 | L2015)  and I generally use 'target' rather than 'protein' in blog posts and journal articles. An irreversible covalent inhibitor acts by first binding non-covalently to its target in the first step with the covalent bond forming in the second step between an electrophilic ligand atom (the term warhead is commonly used) and a nucleophilic target atom such as the sulfur atom of a cysteine. A commonly used measure of activity for irreversible covalent inhibitors is the kinact/KI ratio which can be thought of as the product of affinity (1/KI) and reactivity (kinact). In design of irreversible covalent inhibitors we try to place the electrophilic atom of the warhead within reacting distance of the nucleophilic atom of the target (this is relatively easy if you have a reliable structure of a complex of the target with a relevant ligand that lacks the electrophilic warhead). The non-covalent complex between target and ligand is stabilised by the non-covalent contacts between the target and ligand (the term ‘molecular interactions’ is also used although I prefer to think in terms of ‘non-covalent contacts’ since the latter can be observed experimentally). However, non-covalent contacts also determine reactivity of the non-covalently bound complex by stabilising the transition state (I consider it more correct to think in terms of reactivity of the complex than in terms of reactivity of either the electrophilic warhead or the target nucleophile). In the design context, this means attempting to tune non-covalent contacts to stabilise the transition state to a greater extent than the non-covalent complex.

The LE and CLE metrics share a very serious deficiency in that your perception of efficiency can be altered if you change the value of an arbitrary term in the formula for the metric and I'll start the review of FK2025 by critically examining LE. The meaningless of LE stems from a fundamental misunderstanding of how logarithms work and I'll by point you toward M2011 (Can one take the logarithm or the sine of a dimensioned quantity or a unit? Dimensional analysis involving transcendental functions) that was published in the Journal of Chemical Education. In drug discovery we frequently need to calculate logarithms for quantities and you need to be aware you can’t calculate the logarithm for a dimensioned quantity. Let’s take pIC50 as an example and this quantity is commonly defined as the negative logarithm of the IC50 in mole per litre (M). However, what you actually do when you calculate pIC50 is that you take the negative logarithm of the numerical value of the IC50 when expressed in mole per litre (this is a bit of a mouthful and it can be written more compactly as equation 1 below). While not denying that it is useful to have a convention such as this for expressing potency values logarithmically it should be remembered that the choice of mol per litre (M) is entirely arbitrary and it would be equally correct to use other valid concentration units such as μM or nM. One consequence of choosing mole per litre (M) for expressing IC50 values is that pIC50 values (or at least measured pIC50 values) will generally be positive because of the extreme difficulty of measuring meaningful IC50 values that are greater than 1 M. 


Let’s take a look at the binding free energy ΔG° and you’ll notice that I’ve written it with a degree symbol which indicates that this quantity corresponds to a standard state defined by a concentration value C° (the standard concentration). Equation 2 shows how the binding free energy is defined as the difference in chemical potential between the associating species (target + ligand) and the target-ligand complex with each species at the standard concentration (the degree symbol indicates that that both the binding free energy and chemical potential depend on the value of C° and I’ve also shown this explicitly in the equation although this is not actually necessary). Equation 3 shows the dependence of chemical potential on the concentration C of the species and the standard concentration C°. Taken together, Equation 2 and Equation 3 should clarify the origins of the dependence of binding free energy on the standard concentration comes from (there are two associating species but only one complex). We can’t actually measure binding free energy directly but we can calculate it from the dissociation constant KD using Equation 4 (which can be derived from Equation 2 and Equation 3). It’s important to be aware that if you use Equation 4 to convert ΔG° values between different values of the standard concentration C° you’ll be making the assumption that solutions are dilute (ΔH is independent of concentration) and this is indicated in Equation 5.

The standard concentration is a source of much confusion in the ligand efficiency field and I’ll direct readers to ‘The Nature of Ligand Efficiency’ (p9). While the standard concentration is integral to a valid thermodynamic treatment of target-ligand binding the value of C° is entirely arbitrary (to suggest otherwise would mean that you’ve abandoned thermodynamics). It is conventional in drug discovery (and biochemical) literature to use a C° value of 1 M when reporting ΔG° values. While this convention is certainly beneficial, it is no more (and no less) valid to use a value of 1M than it is to use a value of 1 μM for this purpose. Furthermore a standard state defined by a C° value of 1 M is not biophysically realistic (consider the difficulty of accommodating a mole of target in a volume of 1 litre and the likelihood of a ligand exhibiting aqueous solubility of 1 M). I assume that most biochemists and biophysicists would agree that it is generally not feasible to measure KD values of greater of 1 M (I would be happy to be proven wrong on this point) and this means that drug discovery scientists tend to assume that ΔG° values are necessarily negative.

Let’s now a take a look at ligand efficiency (LE) and you can see from the photo above that some heretics regard the metric as physically nonsensical (if you're interested in how I came to be chatting with fellow blogger Ash then take a look at this post). The LE metric which is regarded as an article of  faith in the fragment-based design community was introduced in the (p5) study with the symbol Δg (see Equation 6 below) and the authors of that study did not actually state that it had to be calculated using a C° value of 1 M (I consider it unlikely that any of the authors were even aware of the dependence of ΔG° on C°). In The Nature of Ligand Efficiency (p9) I defined the quantity ηbind (see Equation 7) by dividing Δg (LE) by RT (when LE values are quoted the molar energy units are usually discarded and T often does not correspond to the temperature at which the assay was run) and by the factor (2.303) used to convert between natural logarithms and base 10 logarithms. The quantity ηbind is directly proportional to Δg (LE) and using it makes it much easier to see how using a different standard concentration can alter your perception of efficiency.  Take a look at Table 1 in (p9) and you’ll see that the three compounds (a fragment, a lead and a clinical candidate) bind with equal efficiency when C° is 1 M. Change C° to 0.1 M and the clinical candidate is binds more efficiently than the fragment but when C° to 10 M the fragment becomes more ligand-efficient than the clinical candidate. As noted in (p9) “In thermodynamic analysis, a change in perception resulting from a change in a standard state definition would generally be regarded as a serious error rather than a penetrating insight.” 

Here's what the authors of FK2025 say about LE: 

LE depends on the choice of the standard concentration (normally 1 M) (p8) (p9) and its maximal available value is size dependent.(p10) (p11) [It's true that LE depends on C° but it’s also true that ΔG° depends on C° and the difference in the two dependencies is that is that LE “depends upon the choice of standard concentration in a nontrivial fashion” (p8). The issue is not the so much that LE depends on C° but that using a different unit to express KD changes how we perceive efficiency. The ΔΔG values that determine perception of affinity don’t change when you use a different value of C° (equivalent to using a different unit to express KD). However, if you use a different value of C° for calculating LE you can see from Table 1 in (p9) that even the ordering of LE values between two ligands can change. I consider the molecular size dependencies of LE observed by the authors of (p10) and (p11) to be artefactual and I’ll point you toward Fig. 1 in (p9) which shows that using a different value of C° can change how we perceive the molecular size dependency of LE.] Nevertheless, LE is an established tool to normalize potency and facilitate the comparison of ligands with a range of potencies and sizes. [It is not uncommon for adherents of religions to consider their beliefs to be established facts.] The usefulness of LE and other efficiency metrics in drug discovery has been extensively analyzed and reviewed elsewhere. (p6) (p12) (p13) (p14) (p15) (p16) (p17) [My view is that nobody has actually demonstrated the usefulness of LE and I’m unconvinced that it would even be possible to do so meaningfully in an objective manner (consider the feasibility of comparing success rates between a group of individuals using LE in discovery projects and a control group of individuals not using LE in discovery projects). Usefulness means that using something provides demonstrable benefits and ‘widely-used’ is not equivalent to ‘useful’ (I’m guessing that more people use homeopathic ‘medicines’ than use ligand efficiency metrics). One piece of advice that I’ll offer to anybody advocating the use of LE in drug design is to ensure that you fully understand the implications of changes in perception resulting from using different units to express quantities not least because you might find yourself lecturing to people who do understand.]

After a lengthy preamble it’s now time to take a look at how the FK2025 study addresses ligand efficiency in the context of irreversible covalent inhibition. One of the challenges in design of drugs that engage their targets irreversibly is that it’s not possible to meaningfully quantify activity with a single parameter. This is particularly relevant to definition of efficiency metrics which are typically derived by either scaling or offsetting a measured activity value by a risk factor such as molecular size or lipophilicity. While you can certainly measure an IC50 value for an irreversible covalent inhibitor the value that you measure will be time-dependent and it’s not generally meaningful to compare two IC50 values that have been measured using different incubation times. While the kinact/KI ratio is time-independent using it as a measure of activity necessarily entails a degree of information loss.

The authors of FK2025 state:

Our starting point is the LE introduced for noncovalent ligands as a useful metric for lead selection. (p5) [LE  was claimed to be useful when it was introduced although no evidence was presented in support of the claim.]

Let’s take a look at Table 2 which shows two equations from FK2025. The first equation, which appears in the text of the article, illustrates two common errors in the efficiency metric field (taking logarithms of dimensioned quantities and discarding units). It should ring alarm bells for the reader when authors make either error (especially if the authors interpret values of the efficiency metrics).


The authors of FK2025 assert that “LE can be decomposed into contributions from the noncovalent recognition and the covalent reaction (Box 2, Equation III)” and this is reproduced in Table 2 as Equation 2. The first term is a commonly-used mathematical formula for LE when inhibition is reversible and it is important to be aware that KI has been divided by an arbitrary concentration value (1 M) in order that it can be expressed as a logarithm (see Equation 1 in Table 1). The argument of the logarithm in the second term is dimensionless although its magnitude does vary with t. Each term in Equation III (Box 2) has a nontrivial dependence on the value of an arbitrary quantity (the 1 M concentration in the first term and t, in the second term). This means that your perception of efficiency when calculated according to Equation III (Box 2) will be altered if you use either a different concentration unit or a different value of t. You can see this effect in Figure 2 (effect of varying of t) and the appearance of Figure 1 will be altered if you use a value of t other than 1 h or a concentration unit other than M for the calculation of LE.

It's now time to examine CLE (defined as Equation II in Box 3 of the FK2025 study) and I’ll direct you to Table 3 below in which I’ve made some comments. Using CLE requires that the IC50 values for the inhibitors of interest all correspond to the same time point (t) and it is not clear whether the authors are suggesting that that the IC50 values should all be measured using the same incubation time or need to be calculated from measured KI and kinact values using Equation III in Box 2. A quantity t is also explicitly present in the argument of the logarithm in Equation II in Box 3 and this is necessary for the argument of the logarithm to be dimensionless (see M2011). The argument of the logarithm in Equation II (Box 3) is clearly time-dependent and this means that your perception of efficiency will be altered if you use a different value of t when calculating CLE (just as your perception of efficiency will be altered if you use a different concentration unit to express IC50 when you calculate LE for reversible inhibitors). It also means the molecular size dependency of CLE will vary with time just as the molecular size dependency of LE varies with the concentration unit used to express affinity as can be seen in Fig. 1 of (p9).

However, there is another difficulty which is that the argument of the logarithm in Equation II (Box 3) is not a valid measure of activity (the same criticism can also be made of the xLE metric introduced in the Z2025 study that Dan has already reviewed).  This problem is a bit more subtle and it’s important to remember that knowing the IC50 value for a reversible inhibitor enables you to generate a concentration response for inhibition.  When you express an IC50 value as a logarithm you need to scale it by a concentration value to ensure that the argument of the logarithm function is dimensionless (see M2011) but it’s important to remember that the concentration unit is still there even though it’s not shown (see Equation 1 in Table 1).

This is a good point at which to wrap up and I’ve argued that CLE has two deficiencies. First, perception of efficiency and its dependency on molecular size both vary with an arbitrary quantity (t) in the argument of the logarithm (this is analogous to the problems caused by the arbitrary nature of the concentration unit used for scaling affinity/potency in the definition of LE for reversible binders). Second, the argument of the logarithm is not a valid measure of activity because it cannot be used to generate a concentration response. Furthermore, I would question the value of aggregating results from multiple assays for analysis even for a valid metric without these deficiencies and I offered the following advice in (p9):   

Drug designers should not automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant to the specific drug design projects on which they are working.

I’ve criticized the FK2025 study at length and saying how I might use data like this in drug design projects is a good way to conclude the post.  A general criticism that I have made of drug design efficiency metrics is that they are based on assumptions of relationships between activity and risk factors such as molecular size.  I argued in (p9) that one should use the trend that is actually observed in the data to normalize activity with respect to risk factors and I’ll point you to the relevant section (Alternatives to ligand efficiency for normalization of affinity) in that article. I would start by attempting to model the relationship between kinact and reactivity with glutathione. The objective of this exercise is to identify inhibitors that best exploit their intrinsic reactivity when forming covalent bonds with the target residue (you can quantify this by how far the point for an inhibitor lies above the trend line and the most interesting compounds have the largest positive residuals). I might  also examine the relationship between kinact and KI for inhibitors with the same intrinsic reactivity (e.g., incorporating the same warhead) with a view to identifying the inhibitors for which non-covalent interactions with the target most effectively stabilise the transition state relative to the non-covalent complex. I should stress that there is no suggestion that these analyses would necessarily yield useful insight.

It's been a long post so thanks for staying with me. This will be the the last post until after Christmas and, as I extend best wishes to all for a happy and peaceful festive season, I'm keenly aware that Christmas will be neither happy nor peaceful for many of our fellow human beings.  

Tuesday, 5 August 2025

Return to Flatland

Whoever first referred to Economics as ‘The Dismal Science’ had clearly never read an article on ‘3Dness’ in drug discovery.  My own experience reading articles on this topic is a sensation of having my life force slowly sucked out (I even suggested that reviewing the '3Dness' literature might be considered as an appropriate penance when I recently confessed my sins at St Gallen Cathedral) and the subject of Confession reminds me of a song that the late great Tom Lehrer sang about the Second Vatican Council.

In this post I review the CNM2025 study (Return to Flatland) which examines the heavily-cited LBH2009 study (Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success).This also is a good point to mention a Journal of Medicinal Chemistry Editorial (Property-Based Drug Design Merits a Nobel Prize) that I reviewed in a 30-Jul-2024 post. The CNM2025 study, which has already been reviewed by Dan and Ash, opens with: 

The year is 2009, Barack Obama has just been inaugurated and both Lady Gaga and The Black Eyed Peas are at the height of their popularity

This couldn’t help but remind me of the “WORLD WAR 2 BOMBER FOUND ON MOON” headline that appeared on the front page of the Sunday Sport twenty-one years before the publication of LBH2009 (it was was accompanied by a photo of a B-17 in a lunar crater). A few weeks later the headline was “WORLD WAR 2 BOMBER FOUND ON MOON VANISHES” (this time accompanied by a photo of the now empty lunar crater).

I’ll start my review of CNM2025 by quoting from it and, as is usual for posts here at Molecular Design, quoted text is indented with any comments by me italicized in red and enclosed in square brackets. 

The hypothesis was attractive, and the data clearly showed the relationship between Fsp3 and clinical progression with pairwise significance P < 0.001. [This statement is inaccurate and Figure 3 of the LBH2009 study shows statistically significant differences at this level between (a) discovery and phase 2 compounds (b) phase 1 and phase 3 compounds (c) phase 2 compounds & drugs.  The authors of LBH2009 state: “The change in average Fsp3 was statistically significant between adjacent stages in only one case (phase 1 to phase 2)” but they neither show this in Figure 3 of their article nor do they report a P-value for the statistical significance of the mean difference in Fsp3 between phase 1 and phase 2 compounds.] The statistics seemed compelling, though the effect size was modest — an increase in average Fsp3 of 0.09 between sets of phase I and approved drugs equates to a difference of around two additional sp3 carbons per drug molecule only. [The authors of LBH2009 did not actually report this difference to be statistically significant so it is unclear why the authors of CNM2025 have stated that the “statistics seemed compelling”.]

The LBH2009 study is effectively a call to think beyond aromatic rings in drug design and my view is that there are considerable benefits in doing so even though I consider the data analysis in the study to be shaky. Almost three decades ago I included a quinuclidine in the Zeneca fragment library for NMR screening and later at AstraZeneca I would actively search (with minimal success) for amides and heteroaryls derived from bicyclic amines. I see the advantages in looking beyond aromatic rings as stemming primarily from increased molecular diversity and a more controllable coverage of chemical space, and in KM2013 we wrote:

Molecular recognition considerations suggest a focus on achieving axial substitution in saturated rings with minimal steric footprint, for example by exploiting the anomeric effect or by substituting N-acylated cyclic amines at C2.

Although data analyses (for example, see HY2010) presented in support of the belief that aromatic rings adversely affect aqueous solubility are typically underwhelming I consider the suggestion to be plausible and suggested in K2022 that deleterious effects of aromatic rings are more likely to be due to their potential for making molecular interactions than to their planarity. That said, I should also point out that the analysis of the relationship between aqueous solubility and Fsp3 presented in Figure 5 of LBH2009 is a textbook example of correlation inflation (see Fig. 5 in KM2013) and I suspect that if a team had submitted this analysis at Statistiques Sans Frontières the judges would have either awarded “nul points” or come to the conclusion that the team had played its joker. Given the Lady Gaga reference in CNM2025 I couldn't resist linking this Peter Gabriel song which includes the lyrics "Adolf builds a bonfire, Enrico plays with it" even though I have absolutely no idea what the the lyrics actually mean.

While the analysis of the relationship between aqueous solubility presented in Figure 5 of LBH2009 does endow the study with what I’ll politely call a whiff of the pasture it’s not directly related to the analysis of clinical progression presented in the study. Let’s take a look at Figure 3 in LBH2009 which shows mean Fsp3 values for compounds in discovery, at the three phases of clinical development, and approved drugs. As an aside this analysis would fall foul of current Journal of Medicinal Chemistry author guidelines (see link; accessed 05-Aug-2025) which clearly mandate that “If average values are reported from computational analysis, their variance must be documented”.  As mentioned earlier in this post Figure 3 in LBH2009 shows statistically significant (P value < 0.001) differences between (a) discovery and phase 2 compounds (b) phase 1 and phase 3 compounds (c) phase 2 compounds & drugs. It’s also worth stressing that Figure 3 in LBH2009 does not show statistically significant differences in Fsp3 for any of the clinical development transitions (phase 1 to phase 2; phase 2 to phase 3; phase 3 to approved drug). Figure 3 of in LBH2009 shows 591 phase 2 compounds but only 376 phase 1 compounds, raising questions about the numbers of compounds that have been in clinical development without being recorded in the database.

I think that there are some problems with how the authors of the LBH2009 study have analysed the relationship between Fsp3 and progression through the stages of clinical development.  If charged with analysing this data I would focus on the three clinical development transitions (phase 1 to phase 2; phase 2 to phase 3; phase 3 to approved drug) and wouldn’t waste time on comparisons between discovery compounds and clinical compounds. If analysing the relationship between Fsp3 and the progression from phase 1 to phase 2, I would partition the set of phase 1 compounds into a ‘YES’ subset of compounds that had progressed to phase 2 and a ‘NO’ subset of compounds that had not progressed to phase 2. I would certainly be taking a close  look at distributions of Fsp3 values (some approaches to assessing statistical significance are based on the assumption of Normally-distributed data values) and I’d also be thinking about assessing effect size in addition to statistical significance. However, the problems with the LBH2009 analysis are more fundamental than non-Normal distributions of Fsp3 values.

The authors of LBH2009 assess the progression from phase 1 to phase 2 by comparing the mean Fsp3 value for the phase 1 compounds with the mean Fsp3 value for phase 2 compounds. The problem is that the Fsp3 values for the YES compounds (that have progressed from phase 1 to phase 2) are present in both the data sets for which comparisons are being made. This means that the observed differences in mean Fsp3 values will reflect both the difference between YES and NO compounds (relevant to relationship between Fsp3 and progression from phase 1 to phase 2) and the relative numbers of YES and NO compounds in the phase 1 data (not relevant to relationship between Fsp3 and progression from phase 1 to phase 2). Analysing the data in the way that the authors of LBH2009 have done effectively adds noise to the signal and it’s possible that they would have observed more statistically significant differences in mean Fsp3 values had they analysed the data in a more appropriate manner.

This is an appropriate point at which to discuss correlation in the context of studies such as LBH2009 and CNM2025. It’s actually well known (see L2013) that that Fsp3 values for chemical structures tend to be greater when amine nitrogen atoms are present (this does not invalidate the observed trends in the data but has big implications for how you interpret these trends). There is, however, a much bigger issue which is that correlation does not imply causation. Let’s suppose that you’ve just joined a drug discovery team as they are preparing to select a clinical candidate (I concede that this is most improbable scenario but it does illustrate a point). The team have an excellent understanding of the structure-activity relationship (SAR) and have successfully addressed a number of issues during the lead optimization process (the chemical structures of the compounds have been quite literally shaped by the problems that the team members have solved). Now consider the likely reaction of the team members to a suggestion that probability of success in the clinic would increase if the chemical structure of the best compound were modified so as to increase its Fsp3 value. My view is that the team might think that the person making such a suggestion had just stepped off the shuttle from Planet Tharg (an alien from this planet used to make occasional Sunday Sport  appearances). I see the trends in data observed by the authors of LBH2009 as effects rather than causes (the vanishing B-17 was never there in the first place).

Let’s return to the CNM2025 study and its authors state:

Using data from the Cortellis Drug Discovery Intelligence database, we repeated an analysis similar to that of Lovering et al. to assess Fsp3 in drugs approved post-2009 and those in active clinical development as of mid-2024 (Fig. 1). [I would challenge the claim that the analysis presented in CNM2025 is similar to that presented in LBH2009. The supplementary material for CNM2025 indicates that the data summarised in Fig. 1b correspond to the period 2012 through 2024 (it is not clear whether the database has been updated to account for compounds that have fallen out of active development during this period. As is the case for Figure 3 in LBH2009, Fig.1b in CNM2025 shows more phase 2 compounds (816) than phase 1 compounds (421), raising similar questions about the numbers of compounds that have been in clinical development without being recorded in the database. I thank fellow blogger  Dan Erlanson for suggesting that I examine the supplemental information for CNM2025.] Although our methods used contemporary data sources different to Lovering et al., we obtained comparable Fsp3 data for approved drugs prior to 2009. More recently however, the picture appears to have changed with approvals shifting to lower Fsp3 drugs (Fig. 1a). Similarly, when looking at drugs currently in clinical development (Fig. 1b), there appeared to be no clear relationship between highest phase reached and Fsp3, suggesting the key conclusion noted by Lovering et al. has not persisted. In all data sets, exemplars with Fsp3 = 0 as well as Fsp3 = 1 are extensively seen. [It is necessary to account for the number of hypotheses have been tested for statistical significance when quoting P-values (see R2016 and VM2018).]

Fig. 1a in CNM2025 shows the time-dependence of Fsp3 distributions for approved drugs according to approval date and I remain unconvinced of the value of analysis like this (on first encountering analysis of time-dependence of drug properties a quarter of a century ago I recall being left with the distinct impression that some senior medicinal chemists where I worked had a bit too much time on their hands). However, it is immaterial whether or not you are as underwhelmed as I am by time-dependence of drug properties because no such analysis is actually reported in LBH2009 and this is one reason that I challenge the claim by made by the authors of CNM2025 that they “repeated an analysis similar to that of Lovering et al. to assess Fsp3 in drugs approved post-2009 and those in active clinical development as of mid-2024”.  

Now let’s take a look at Fig. 1b in CNM2025 and this should be compared with Figure 3 in LBH2009. In some ways the former is an improvement on the latter since the violin plots show the distributions of Fsp3 values for each group of compounds and, as mentioned earlier in the post, I don’t think that it makes any sense to include discovery compounds in analysis like this (as the authors of LBH2009 did). Although these two figures look superficially similar they are actually very different and, given that the authors of CNM2025 only included "compounds in clinical trials as of mid-2024" in their study, I would argue that their study does not properly examine the link between Fsp3 and clinical progression. I agree that the difference between mean Fsp3 values for drugs approved up to 2009 and for drugs approved after 2009 is statistically significant. What is not clear from the analysis summarized in Fig. 1b in CNM2025 is whether the lower Fsp3 values of drugs that were approved after 2009 reflect smaller increases in Fsp3 over the course of clinical development (the B-17 has disappeared from the lunar crater) or lower Fsp3 values for compounds entering clinical development (the B-17 is still in the lunar crater). I think it's possible to address this question but you would need to analyse the data a lot more carefully than the authors of CNM2025 appear to have done. For example, you might examine the time-dependencies of mean Fsp3 values for compounds evaluated in phase 1 and the corresponding mean Fsp3 values for compounds that progressed or failed to progress to phase 2. While I consider more careful analysis of progression to be feasible I see little or no value from the perspective of real world drug discovery in actually performing the analysis more carefully.

This is a good point at which to wrap up and, unless the the trends in the data can shown to reflect causation, the debate can be described as bald men fighting over a comb (as one who is follicly challenged I always find it painful to use this phrase). I see variation in drug properties with time as an effect rather than a cause and Forrest Gump would have been well aware of this fifteen years before the publication of LBH2019 when he famously observed that "shit happens". One point on which the CNM2025 authors and I do appear to agree is that there is not currently a B-17 in a lunar crater. Where we appear to differ is that they seem to be suggesting this was because it has vanished while I never believed that it was ever there in the first place. I’ll let the late great Dave Allen have the last word.