Molecular Design: hit-to-lead

Showing posts with label hit-to-lead. Show all posts

Wednesday, 31 December 2025

Hit to Lead best practice?

I'm now in Trinidad and I'll share a 180° panorama from Paramin where I walk for exercise. This district in Trinidad's Northern Range is renowned for its agriculture and the most excellent produce is grown in 'gardens' on steep hillsides. My walk would take about two and a quarter hours if I just walked but it usually takes rather longer because I like to take photos and often stop on the ridge to gaze at corbeaux 'surfing' the updrafts. Most of all I enjoy catching up with friends in Paramin and not so long ago one of them was telling me about the sound made by douens (which have terrified me since childhood because I was never baptised). Some years ago I was struggling along the ridge with a hacking cough that I'd brought with me from the UK three days previously when I heard a familiar voice (one of my friends was visiting his sister). The conversation turned to my cough and he instructed his sister to bring some medicine. She produced a bottle of a liquid that looked like fluorescein and, as she decanted some into a shot glass my friend exclaimed "dat too much yuh go kill he". The liquid appeared to have a puncheon base and my friend's sister also gave me some bush to make tea. My cough was history after three days.

I’ll be taking a look at The European Federation for Medicinal Chemistry and Chemical Biology (EFMC) Best Practice Initiative: Hit to Lead (Q2025) in this post. I have a number of criticisms of this work and it really shouldn’t need saying that you do raise the bar for yourself when you present your work as defining best practices. As is customary for blog posts here at Molecular Design I’ve used Q2025 reference numbers when referring to literature studies and quoted text is indented with my comments in red italics. This will be a long tedious post and strong coffee is recommended.

Best practices are, in essence, recommended ways of doing things and it’s actually very difficult to demonstrate objectively that one way of doing things is better (or worse) than another way. My general view of Q2025 is of a poorly organized article that at times lacks clarity and coherence. Some of the advice offered on how best to do Hit to Lead (H2L) work is unsound and the Authors also make a number of significant errors. Although the abstract refers to “contemporary drug discovery” the recommended best practices do, in my view, appear to be firmly rooted in the past given that that fragment-based design (FBD) is not covered and there is no mention of important 'new' modalities such as irreversible covalent inhibition and targeted protein degradation. It’s worth mentioning that biological activity for some new modalities cannot be meaningful quantified as a single parameter such as an IC₅₀ value and this complicates the use of ligand efficiency metrics (a post on covalent ligand efficiency will give you an idea of the tangles you can get yourself into) which the Authors seem to consider important in H2L work. I consider the quantity of literature cited in Q2025 to be excessive, especially given that some of the cited articles have minimal relevance to H2L work (the failure of the Authors to cite R2009 is also noteworthy). In some cases the cited literature does not support assertions made by the Authors. In my view Figures 1, 5 and 8 are redundant.

While I see plenty wrong with Q2025 it’s worth flagging up points on which the Authors and I appear to be in agreement. I think that they put it well with the following statement:

Leads have line of sight to a development candidate and bring an understanding of what priorities Lead Optimisation should address.

I used this football analogy in an earlier post:

The screening phase is followed by the hit-to-lead phase and it can be helpful to draw an analogy between drug discovery and what is called football outside the USA. It’s not generally possible to design a drug from screening output alone and to attempt to do so would be the equivalent of taking a shot at goal from the centre spot. Just as the midfielders try move the ball closer to the opposition goal, the hit-to-lead team use the screening hits as starting points for design of higher affinity compounds. The main objective in the hit-to-lead phase is to generate information that can be used for design and mapping structure-activity relationships for the more interesting hits is a common activity in hit-to-lead work.

I certainly agree that it is important to establish structure-activity relationships (SARs) for structural series of interest although I have no idea what the Authors mean by “dynamic SAR”. I also agree that consideration of physicochemical properties, especially lipophilicity, is very important in H2L work (just as it is in optimisation of the leads) although the case for a Nobel Prize made in a 2024 JMC Editorial does, in my view, appear to have been overcooked.

I argue that drug discovery should be seen in a Design of Experiments framework (generate the information that you need as efficiently as possible) rather than as the prediction exercise that many who tout machine learning (ML) as a panacea for the ills of Pharma & Biotech would have you believe. Regardless of which view prevails it’s abundantly clear that generation and analysis of data are very important in contemporary drug discovery and are likely to become even more important in the future). However, if you’re going to base decisions on trends in data then it’s important that you know how strong the trends are because this tells you how much weight to give to the trends when making your decisions. Most drug discovery scientists will have encountered analyses of relationships between predictors of ADME (absorption, distribution, metabolism, and excretion) and physicochemical and chemical structure descriptors and we observed in the KM2013 perspective that:

The wide acceptance of Ro5 provided other researchers with an incentive to publish analyses of their own data and those who have followed the drug discovery literature over the last decade or so will have become aware of a publication genre that can be described as ‘retrospective data analysis of large proprietary data sets’ or, more succinctly, as ‘Ro5 envy’.

In some cases trends observed in data are presented in ways that make them appear to be stronger than they actually are (this is typically achieved by categorizing continuous-valued data prior to analysis) and [13a], [24] and [26] were criticised in this context in KM2013. When reading articles on drug-likeness and compound quality it is also important to be aware that correlation does not imply causation. One should be particularly wary of of studies such as [20c] which present analyses of proprietary data as "facts" or claim that such analyses have revealed "principles". I see the weakness of these trends partly as a reflection of chemical structure diversity in datasets and would expect the corresponding trends to be stronger within structural series (I offer the following advice in NoLE):

Drug designers should not automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant to the specific drug design projects on which they are working.

I see erosion of critical thinking skills as a significant problem in contemporary drug discovery and some leaders in the field appear to have lost the ability to distinguish what they know from what they believe. As I observed in a review of a 2024 JMC Editorial (Property-Based Drug Design Merits a Nobel Prize) the Rule of 5 (Ro5) is not actually supported by data in the form that it was stated. The wide acceptance of Ro5 as a definition of drug-likeness propagates what I consider to be a misleading view that drugs occupy a contiguous and distinct region of chemical space. Some of the claims made in the JMC Editorial (“a compound is more likely to be clinically developable when LipE > 5”, “a discovery compound is more likely to become a drug when Fsp3 > 0.40” and “a compound is more likely to have good developability when PFI < 7”) do not appear to be based on data. I remain sceptical that developability and likelihood of clinical success of a compound can be meaningfully assessed even when one knows that the compound actually exhibits exploitable activity against the target(s) of interest. In my view the suggestion that simple drug discovery guidelines are worthy of a Nobel Prize does a huge disservice to drug discovery scientists by trivializing the very significant challenges that they face.

Like many in the drug discovery field, I consider lipophilicity to be the single most important physicochemical property in drug discovery and I would generally anticipate that a surfeit of lipophilicity will end in tears. That said, I don't consider lipophilicity to be usefully predictive of physicochemical properties such as permeability and aqueous solubility that are more relevant than lipophilicity from the perspective of oral absorption. When I assert that lipophilicity is not "usefully predictive" I'm certainly not denying that trends in data exist. However, I must stress that the trends are not so strong that having solubility values that have been predicted from lipophilicity means that you no longer need to measure aqueous solubility.

In drug discovery projects I generally recommend examination of the response of potency (expressed as a logarithm) to increased lipophilicity. In the ideal situation the correlation of potency with lipophilicity will be weak, indicating that potency is driven by factors other than lipophilicity. If the correlation of potency with lipophilicity is strong then you need the response (the slope for a linear correlation) to be relatively steep. I consider it to be generally helpful to plot potency against lipophilicity with reference lines corresponding to different LipE values (see R2009 which is a lot more relevant to H2L work than much of the literature cited in the Q2025 study) and I would also suggest modelling the response and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the "Alternatives to ligand efficiency for normalization of affinity" section of NoLE).

In drug discovery lipophilicity is usually quantified by the logarithm of the octanol/water partition coefficient (log P) or distribution coefficient (log D). The choice of octanol/water for quantification of lipophilicity is arbitrary and some, including me, consider saturated hydrocarbons such as cyclohexane or hexadecane to be physically more realistic than octanol as a model for the core of a lipid bilayer. It is the distribution coefficient (D) rather than the partition coefficient (P) that is measured for lipophilicity assessment although the two quantities are equivalent when ionization can be safely neglected. Values of logP for ionizable compounds can be derived from the response of log D to pH although this is not generally done routinely in in drug discovery. Alternatively, you can make the assumption that only neutral forms of compounds partition into the organic phase and use (1) in the H2L best practices post graphic (see also K2013) to convert log D values to log P values (to do this you’ll also need a reliable estimate for pKa in order to calculate the neutral fraction). When log D (as opposed to log P) is used to assess the ‘quality’ of compounds you can make compounds better simply by increasing the extent to which they are ionized and I hope you can see that going down this path is likely to end as well as things did for the Sixth Army at Stalingrad.

In drug discovery log P values are typically calculated and it can often be quite difficult when reading the literature to know which method has been used for the calculations (sometimes the term ‘cLogP’ appears to have been used simply to denote that log P values have been calculated). For example, it is stated in [13a] that “Physical property data were obtained from AstraZeneca’s C-Lab tool, incorporating standard packages for LogP calculations (cLogP, ACDLogP), and an in-house algorithm for the distribution coefficient (1-octanol–water LogD at pH 7.4)”. In general, different prediction methods will give different log P values for the same compound (for example the Ro5 lipophilicity threshold is 5 when ClogP is used but 4.15 when MlogP is used). That said, choice of method for predicting log P and whether you use measured log D or predicted log P become less important issues when working within structural series because hydrogen bond donors and acceptors, and ionizable groups tend to be relatively conserved under this scenario.

That log D and log P are different quantities in the context of drug design is one of a number of things that the Authors of [34a] (Molecular Property Design: Does Everyone Get It?) just don’t seem to ‘get’ and I’ll point you toward a blog post in which this point is discussed in a bit more detail. Let’s examine Figure 2 (Impact of hydrophobicity on developability assays and the profile of marketed oral drugs) of [34a] and I’d like you to look at the upper panel (a). You’ll notice that the visualization for some of the ‘developability’ assays is based on PFI (derived from log D measured chromatographically at pH 7.4). However, the visualization for hERG (+1 charge) and promiscuity is based on iPFI (derived from ‘Chrom logP’ and it is not clear how this quantity was defined or generated). I would also argue that the activity criterion (pIC₅₀ > 5) used in the promiscuity analysis is too permissive to be physiologically relevant (this is a common issue in the promiscuity literature). As an aside, I am unconvinced that log D values were actually measured chromatographically at pH 7.4 for all the drugs that form the basis of the analysis shown in the lower panel (b) of Figure 2.

After a long preamble it’s time to start my review of Q2025 and comments will follow the order of the article. I see the citation of [2] and [3] as gratuitous while [4] does not appear to present evidence in support of the claim that “ensuring high quality of lead series is a large cost and time saver in the overall process of drug discovery” (it must be stressed that I certainly don’t deny the value of high quality lead series and am merely pointing out that the chosen reference does not actually demonstrate that higher quality of lead series result in cost and time savings in drug discovery).

In my view neither Figure 1 nor its caption (see below) makes any sense.

Figure 1. Illustration of the multi-objective characterisation necessary in the journey from a hit to a drug. All these necessary characteristics, described by illustrative principal components, are influenced by the physicochemical properties of the molecules.

You’ll frequently encounter graphics like Figure 1 that show low-dimensional chemical spaces in the drug discovery literature (for example, a 2-dimemsional space might be specified in terms of lipophilicity and molecular size). While it’s very easy to generate graphics like these the relevance of the chemical spaces to drug design is often unclear. There are ways in which you can demonstrate the relevance of a chemical space to drug design and, for example, you might build usefully predictive models for quantities such as IC₅₀, aqueous solubility or permeability using only the dimensions of the particular chemical space as descriptors. Alternatively, you could show that compounds in mutually exclusive categories such as ‘progressed to phase 2’ and ‘failed to progress to phase 2’ occupy different regions of the chemical space (note that it’s not sufficient to show that a single class of compounds such as ‘approved drugs’ occupies a particular region within the chemical space and this is the essence of a general criticism that I make of Ro5 and QED). It is common to depict the different categories as ellipses that enclose a given fraction of the data points corresponding to each category and the orientation of each ellipse with respect to the axes indicates the degree to which the descriptors that define the chemical space are correlated for each category. One problem with Figure 1 is that the meaning of the ellipses is unclear and I would challenge the assertion made by the Authors that “the journey of a drug discovery campaign is characterized in Figure 1, showing how the active hit needs to be modified to address the requirements impacting the efficacy and safety of the molecule”.

Potency optimisation alone is not a viable strategy towards the discovery of efficacious and safe drugs, or even high-quality leads. Concurrent optimisation of the physicochemical properties of a molecule is the most important facet of drug discovery, as these properties influence its behaviours, disposition and efficacy [12a | 12b]. [While I certainly agree that there is a lot more to drug design than maximisation of potency I would argue that controlling exposure is a more important objective than optimization of physicochemical properties (on the subject of exposure I recommend that all drug discovery scientists take a look at the SM2019 article). It's also worth bearing in mind that you can't compensate for inadequate potency with increased compound quality. I don't consider either reference as evidence that "concurrent optimisation of the physicochemical properties of a molecule is the most important facet of drug discovery" and it is not accurate to describe metabolic stability, active efflux and affinity for anti-targets as "physicochemical properties". I think the Authors need to say more about which physicochemical properties they recommend to be optimized and be clearer about exactly what constitutes optimization. Lipophilicity alone is not usefully predictive of properties such as bioavailability, distribution and clearance that determine the effects of drugs in vivo.] Together these outcomes define the quality of the molecule, indicative of its chances of success in the clinic, as evidenced in numerous studies [13a | 13b]. [Neither of these articles appears to provide convincing evidence of a causal relationship between “the quality of a molecule” and probability of success in the clinic. Much of the 'analysis' in [13a] consists of plots of median values without any indication of the spreads in the corresponding distributions and to see it cited in connection with "evidenced" rings alarm bells for me. As explained in KM2013 presenting data in this manner exaggerates trends and I consider it unwise to base decisions on data that have been presented in this manner. Quite aside from from the issue of hidden variation I do not consider the relationship between promiscuity and median cLogP reported (Figure 3a) in [13a] to be indicative of probability of success in the clinic, given that the criterion for 'activity' ( > 30% inhibition at 10 µM) is far too permissive to be physiologically relevant (this is a common issue in the promiscuity literature).]

While the optimal lipophilicity range has been suggested as a log D_7.4 between 1 and 3, [15] this is highly dependent on the chemical series. [The focus of the analysis was permeability and the range was actually defined in terms of AZlogD (calculated using proprietary in-house software) as opposed to log D measured at 7.4. The correlation between the logarithm of the A to B permeability and AZlogD is actually very weak (r² = 0.16) which would imply a high degree of uncertainty in threshold values used to specify the optimal lipophilicity range. While I remain sceptical about the feasibility of meaningfully defining optimal property ranges the assertion that the proposed range in AZlogD of 1 to 3 “is highly dependent on the chemical series” is pure speculation and is not based on data.] Best practice would be to generate data for a diverse set of compounds in a series, if measuring it for all analogues is not possible, and determine the lipophilicity range that leads to the most balanced properties and potency [3 | 16]. [It is not clear what the Authors mean by “most balanced properties and potency” nor is it clear how one is actually supposed to use lipophilicity measurements to objectively “determine the lipophilicity range that leads to the most balanced properties and potency”. My view is that to demonstrate "balanced properties and potency" would require measurements of properties such as aqueous solubility and permeability that are more predictive than lipophilicity of exposure in vivo. I do not consider either [3] or [16] to support the assertions being made by the Authors.] Lipophilicity and pKa prediction models can then guide further designs and synthesis of analogues along the optimisation pathway (Figure 3 [17]). but measurements are advised, particularly by chromatographic methods, such as Chrom log D_7.4, in [18] contemporary practice. [In general, it is very difficult to convincingly demonstrate that one measure of lipophilicity is superior to another. Chromatographic measurement of log D is higher in throughput than the shake flask method used traditionally but it is unclear as to which solvent system the measurement corresponds. Furthermore, the high surface area to volume area of the stationary phase means that an ionized species can interact to a significant extent with the non-polar stationary phase while keeping the ionized group in contact with the polar stationary phase and one should anticipate that the contribution of ionization to log D values might be lower in magnitude than for a shake flask measurement.]

As noted earlier in the post I consider it helpful to plot (as is done in Figure 3 which also serves as the graphical abstract) potency against lipophilicity with reference lines corresponding to different LLE (LipE) values (see R2009 which really should have been cited) to be a good way for H2L project teams to visualize potency measurements for their project compounds. That said, I consider view of the discovery process implied by Figure 3 to be neither accurate nor of any practical value for scientists working on H2L projects. It is relatively easy to define optimization of potency and measurements in an vitro assay are typically relevant to target engagement in vivo (uncertainty in the concentration of the drug in the target compartment, and of the species with which it competes, is likely to be the bigger issue when trying to understand why in vitro potency fails to translate to beneficial effects in vivo). One specific criticism that I will make of the Figure 3 is that it appears to imply that it doesn't matter whether you use log P or log D (when you use log D you can reduce lipophilicity to acceptable levels simply by increasing the extent to which compounds are ionized).

However, there is quite a bit more to optimization of properties such as permeability, aqueous solubility, metabolic stability and pharmacological promiscuity that are believed to be predictive of ADME and toxicity, and my view is that defining optimization in terms of determining "the lipophilicity range that leads to the most balanced properties and potency" to be hopelessly naive. The principal objective in H2L work (and in lead optimization) is to identify compounds for which potency and properties related to ADME and toxicity are all acceptable. Defining meaningful acceptability criteria is non-trivial and H2L teams also typically need to make decisions as to how criteria can be relaxed with a minimum of risk. It's important to be aware that you can't compensate for inadequate potency by making the other properties better and those who argue that drug discovery scientists should focus on lipophilic efficiency rather than potency are missing this point.

While plotting potency against lipophilicity with reference lines corresponding to different LLE (LipE) values is often a helpful way to visualise project data in H2L (and in lead optimization) I don't consider Figure 3 to provide an accurate or useful view of the typical H2L process. Figure 3 presents a view that a hit maps to a lead which in turn maps to a drug candidate. In reality the screening phase of a discovery project will identify multiple hits and the resulting leads are not single compounds but structural series. It is important to be aware that the practical (as opposed to conceptual) utility of a graphic such as Figure 3 is limited by the extent to which the chosen measure of lipophilicity is predictive of properties such as aqueous solubility, permeability and metabolic stability.

Although Q2025 claims to define H2L best practices the Authors don't appear to demonstrate much awareness of the nature of the H2L process. The first step in the H2L process is to follow up hits from the initial screen by assaying potential compounds of interest (summarised in Figure 2) although and in some cases some follow up might have already been done in the hit generation phase. Hits tend to group into structural families and the H2L chemists then synthesise compounds (in some organizations synthesis is outsourced) with a view to identifying compounds that are more potent that the hits. Decisions as to which compounds are to be made are typically hypothesis-based (see P2012) although in some cases genuinely predictive models might be available to the H2L team. Design hypotheses are typically based on information available to H2L teams, such as SARs derived from the hits or relevant target structures, and predictive models might be based on free energy calculations (see ASC2025). As the H2L teams generate more information design hypotheses become more specific and models based on project data become more predictive.

I would argue that establishing (and exploiting) SARs and structure-property relationships (SPRs) constitutes a basis for design in H2L work. Certain features of SARs are especially relevant to H2L work and an observation that a reduction in log P leads to increased potency (or at least a minimal decrease in potency) is information that project teams can make good use of. Other SAR features that I would advise H2L scientists to look for are activity cliffs (relatively small changes in structure result in relatively large changes in potency) and superadditivity (effect on potency of simultaneously making two structural modifications is greater than what would be expected from the effects of making each structural modification individually).

I see managing the 'assay budget' as a critical activity (especially when running assays is outsourced). For example, differences in lipophilicity between structurally related compounds are typically easy to predict and measuring large numbers of log D values is likely to be wasteful of resources. H2L teams need to use their assay budgets to identify and address issues efficiently and I don't consider the suggestion that H2L teams use a generic tiering approach such as the one shown in Figure 9 to be especially helpful. Something that I do suggest H2L teams consider is to try to assess responses of properties such as aqueous solubility and permeability to lipophilicity (this means making measurements for less potent compounds).

Figure 3. There are numerous routes to climb a mountain, as there are to discover a drug, but a measured approach to lipophilicity will guide an optimal path, [The Authors need to articulate what they mean by “a measured approach to lipophilicity” (which does come across as arm-waving) and provide evidence to support their claim that it “will guide an optimal path”.] where the outcome is usually driven by a balance of activity and lipophilicity [This appears to be a statement of belief and the Authors do need to provide evidence to support their claim. The Authors also need to say more about how the “balance of activity and lipophilicity” can be objectively assessed.] (The parallel lines represent LLE, i.e. pIC₅₀ - log P). [This way of visualizing data was introduced in the R2009 study which, in my view, should have been cited.]

Thus the Distribution Coefficient, (log D at a given pH) is a highly influential physical property governing ADMET profiles [20a | 20b | 20c] such as on- and off-target potency, solubility, permeability, metabolism and plasma protein binding (Figure 4) [14b]. [I recommend that the term ‘ADMET’ not be used in drug discovery because ADME (Absorption, Distribution, Metabolism, and Excretion) and T (Toxicity) are completely different issues that need to be addressed differently in design. I would argue that the ADME profile of a drug is actually defined by its in vivo characteristics such as fraction absorbed (which may vary with dose and formulation), volume of distribution and clearance (the Authors appear to be confusing ADME with in vitro predictors of ADME) and I would also argue that toxicity is an in vivo phenomenon. In order to support the claim that log D “is a highly influential physical property governing ADMET profiles” it would be necessary to show that log D is usefully predictive of what happens to drugs in vivo. My view is that the cited literature does not support the claim that log D “is a highly influential physical property governing ADMET profiles” given that [20a] does not even mention log D and neither [20b] nor [20c] provides any evidence that log D is usefully predictive of in vivo behaviour of drugs.]

Figure 4. The impact of increasing lipophilicity on various developability outcomes [14b] [It is unclear as to whether lipophilicity is defined for this graphic in terms of log P or log D. It would be necessary to show more than just the ‘sense’ of trends for the term “impact” to be appropriate in this context. I do not consider the use of the term “developability outcomes” to be either accurate or helpful.]

Aqueous solubility is certainly an important consideration in H2L work although I think that the Authors could have articulated the relevant physical chemistry rather more clearly than they have done. You can think of the process of dissolution as occurring in two steps (sublimation of the solid followed by transfer from the gas phase to water). Lipophilicity usually features in models for prediction of aqueous solubility although I consider wet octanol to be a thoroughly unconvincing model for the gas phase. We generally assume that aqueous solubility is limited by the solubility of the neutral form (which is why ionization tends to be beneficial) but when this assumption breaks down the solubility that you measure will depend on both the nature and concentration of the counter-ion. As I note in HBD3 optimization of intrinsic aqueous solubility (the solubility of the neutral form of the compound) is still a valid objective for ionizable compounds because we're typically assuming that only neutral species can cross the cell membrane by passive permeation.

Some general advice that I would offer to drug discovery scientists encountering solubility issues is that they should try to think about molecular structures from the perspectives of molecular interactions in the solid state and crystal packing. I would expect the left hand 'Reduce crystal packing' structure in Figure 6 to be able to easily adopt a conformation in which the planes corresponding to the aromatic rings and amide are all mutually coplanar (this is a scenario in which a non-aromatic replacement for an aromatic ring might be expected to have a relatively large impact). In HBD3 I suggest that deleterious effects of aromatic rings on aqueous solubility might be due to molecular interactions of the aromatic rings rather than their planarity. I also suggest in HBD3 that elimination of non-essential hydrogen bond donors be considered as a tactic for improving aqueous solubility because it tends to increase the imbalance between hydrogen bond donors and acceptors while minimizing the resulting increase lipophilicity.

Rational [this use of "rational" is tautological] reasons for poor solubility were succinctly described by Bergstrom, who coined "Brick Dust and Greaseballs" as two limiting phenomena in drug discovery [22] which are in line with the empirical findings that led to General Solubility Equation [23] (Figure 5). [I don’t consider the General Solubility Equation to have any relevance to H2L work because it has not been shown to be usefully predictive of aqueous solubility for compounds of interest to medicinal chemists and the inclusion of Figure 5, which merely shows how predicted solubility values map on to an arbitrary categorisation scheme, appears to be gratuitous.] Succinctly, three factors influence solubility: lipophilicity, solid state interactions and ionisation. [It is solvation energy as opposed to lipophilicity that influences solubility and wet octanol is a poor model for the gas phase.] Determining which are the strongest drivers of low solubility will guide the optimisation (Figure 6). Using the analysis in Figure 5 the Solubility Forecast Index emerged, using the principle that an aromatic ring is detrimental to solubility, roughly equivalent to an extra log unit of lipophilicity for each aromatic ring (Thus SFI = clog D_7.4+ #Ar) [24]. [I consider the use of the term “principle” in this context to to be inaccurate given that that the basis for SFI is subjective interpretation of a graphic generated from proprietary aqueous solubility data and I direct readers to the criticism of SFI in KM2023.] Minimising aromatic ring count is an important and statistically significant metric to consider [25] [The importance of minimizing aromatic ring count is debatable and it is meaningless to describe metrics as “statistically significant”.] - consistent with the "escape from flatland" concept [26] that focusses on increasing the sp³ (versus sp²) ratio in molecules, [The focus in the “escape from flatland” study is actually on the fraction of carbon atoms that are sp3 (Fsp3) and not on “the sp³ (versus sp²) ratio”.] even though no significant trends are apparent in detailed analyses of sp³ fractions [27]. [The “analyses of sp³ fractions” in [27] consist of comparisons of drug - target medians for the periods 1939-1989, 1990-2009 and 2010-2020 and all appear to be statistically significant (although I don't consider these analyses to have any relevance to H2L work). I consider the citation of [27] in this context to be gratuitous and this blog post might be of interest.]

An important factor in hit selection is to prioritise compounds with higher ligand efficiency. Ligand efficiency, defined as activity [LE is actually defined in terms of Gibbs free energy of binding and not activity.] per heavy atom (LE=1.37 * pKi/Heavy Atom Count, Figure 7a), is commonly considered in discovery programmes as a quality metric [33]. [LE (Equation 3 in the H2L best practices post graphic) is actually defined as the Gibbs free energy of binding, ΔG° (Equation 2 in H2L best practices post graphic), divided by the number of non-hydrogen atoms, N_nH (this is identical to heavy atom count although I consider the term to be less confusing), but the quantity is physically (and thermodynamically) meaningless because perception of efficiency varies with the arbitrary concentration, C°, that defines the standard state (see Table 1 in NoLE). Using a standard concentration enables us to calculate changes in free energy that result from changes in composition and, while the convention of using C° = 1 M when reporting ΔG° values. is certainly useful, it would be no less (or more) correct to report ΔG° values for C° = 1 µM. Put another way the widely held belief that 1 M is a 'privileged' standard concentration is thermodynamic nonsense (Equation 2 in the H2L best practices post graphic shows you how to interconvert ΔG° values between different standard concentrations). Given the serious deficiencies of LE as a drug design metric, I suggest modelling the response of affinity to molecular size and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the 'Alternatives to ligand efficiency for normalization of affinity' section of NoLE). There are two errors in the expression that the Authors have used for LE (the molar energy units are missing and the expression is written in terms of K_i rather than K_D). The factor of 1.37 in the expression for LE comes from the conversion of affinity (or potency) to ΔG° at a temperature of 300 K, as recommended in [35], although biochemical assays are typically are typically run at human body temperature (310 K). My view is that it is pointless to include the factor of 1.37 given that this entails dropping the molar energy units and using a temperature other than that at which the assay was run. Dropping the factor of 1.37 would also bring LE into line with LLE (LipE).] Various analyses suggest that, on average, this value barely change over the course of an optimisation process [20b | 27 | 34a | 34b] - so it is important to consider maintenance of any figure during any early SAR studies. [I disagree with this recommendation. These analyses are completely meaningless because the variation of LE over the course of an optimization itself varies with the concentration unit in which affinity (or potency) is expressed (Table 1 of NoLE illustrates this for three ligands of that differ in molecular size and potency). In [34a] the start and finish values values of LE were averaged over the different optimizations without showing variance and it is therefore not accurate to state that the study supports the assertion that LE values "barely change over the course of an optimisation process".] Lipophilic Ligand Efficiency (activity minus lipophilicity typically pKi -log P, Figure 7b), which is widely recognised as the key principle in successful drug optimisation, comes into play both for hit prioritization and optimisation. [LLE is a simple mathematical expression and I don’t consider it accurate to describe it as a “principle” let alone “the key principle in successful drug optimisation”. LLE can be thought of as quantifying the energetic cost of transferring a ligand from octanol to its target binding site although this interpretation is only valid when the ligand is predominantly neutral at physiological pH and binds in its neutral form. LLE is just one of a number of ways to normalize potency with respect to lipophilicity and I don't think that anybody has actually demonstrated that (pIC₅₀ – log P) is any better (or worse) as a drug design principle than pIC₅₀ – 0.9 × log P. When drug discovery scientists report that they have used LLE it often means that they have plotted their project data in a similar manner to Figure 3 as opposed to staring at a table of LLE values for their compounds. As an alternative to LLE (LipE) for normalization of affinity (or potency) with respect to lipophilicity I suggest modelling the response and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the 'Alternatives to ligand efficiency for normalization of affinity' section of NoLE).] Improving this value reflects producing potent compounds without adding excessive lipophilicity. Taken together, it has been shown that for any given target, the drugs mostly lie towards the leading "nose" [?] where LE and LLE are both towards higher values [20b | 35]. [This perhaps not the penetrating an insight that the Authors consider it to be, given that drugs are usually more potent than the leads and hits from which they have been derived.] However, setting aspirational targets for either metric is unwise, as analysis of outcomes indicates that the values are target dependant [20b]. [I consider target dependency to be a complete red herring in this context and a more important issue is that you can’t compensate for inadequate potency by reducing molecular size or lipophilicity.] Focusing on increasing LLE to the maximum range possible and prioritizing series with higher average values is the recommended strategy [27 | 36]. [It is not clear what is meant by “increasing LLE to the maximum range possible” and I consider it very poor advice indeed to recommend “prioritizing series with higher average values” (my view is that you actually need to be comparing the compounds from different series that have a realistic chance of matching the desired lead profile. The Authors of Q2025 appear to be misrepresenting [36] given that the study does not actually recommend “prioritizing series with higher average values”. This blog post on [27] might be relevant.]

One can summarize this section with a simple but critical best practice: potency and properties (physicochemical and ADMET) have to be optimized in parallel (Figure 8) [37] to get to quality leads and later drug candidates with higher chances of clinical success. Whilst seemingly trivial, this proposition is rendered challenging by an "addiction to potency" and a constant reminder of this critical concept remains useful for medicinal chemists [38]. [My view is that many medicinal chemists had already moved on from the addiction to potency when the molecular obesity article was published a decade and a half ago and I would question the article's relevance to contemporary H2L practice. The threshold values that define the GSK 4/400 rule actually come from an arbitrary scheme used to categorize the proprietary data analyzed in the G2008 study as opposed to being derived from objective analysis of the data. The study reproduces the promiscuity analysis from [13a] which I criticised earlier in this post for exaggerating the strength of the trend and using an excessively permissive threshold for ‘activity’.] With poor properties, even "good ligands" may not fully answer pharmacological questions [39a | 39b]. [These two articles focus on chemical probes and I don’t consider either article to have any relevance to H2L work. Chemical probes need to be highly selective (more so than drugs) and permeable although solubility requirements are likely to be less stringent when using chemical probes to study intracellular phenomena than in H2L work and you don't generally need to worry about achieving oral bioavailability.]

I agree that mapping SARs for structural series of interest is an important aspect of H2L work and activity cliffs (small modifications in structure resulting in large changes in activity) are of particular interest given the potential for beating trends and achieving greater selectivity. Instances of decreased lipophilicity resulting in increased potency (or at least minimal loss of potency) should also be of significant interest to H2L teams. When mapping SARs it is important that structural transformations should change a single pharmacophore feature at a time and one should always consider potential ‘collateral effects’, such as perturbed conformational preferences, that might confound the analysis. Some of the structural transformations shown in Figure 10 change more than one pharmacophore feature at a time which makes it impossible to determine which pharmacophore feature is required for activity.

Figure 10. Conceptual example of iterative SAR [The meaning of the term “iterative SAR” is unclear] to determine the pharmacophore. As each change may affect binding interactions, conformation and ionization state; complementary structural modification [The meaning of "complementary structural modification" is unclear] will be needed to understand the change in potency and determine the pharmacophore
Is Nitrogen needed (e.g. HBA)? [In addition to eliminating the quinoline N hydrogen bond acceptor this structural transformation eliminates a potential pharmacophore feature (the amide carbonyl oxygen can function as a hydrogen bond acceptor) while creating a cationic centre which will incur a significant desolvation penalty.]
Is NH needed? [This structural transformation eliminates the amide NH but it also is unlikely to address the question of whether the NH is needed because the amide carbonyl has also been eliminated.]
Is carbonyl needed? [The elimination of the amide carbonyl oxygen (hydrogen bond acceptor) creates a cationic centre which will incur a desolvation penalty.]

As a last proposition, [49a | 49b] we suggest that the progress in computational physicochemical and ADMET property predictions represents an opportunity to accelerate the optimisation of molecules with a "predict-first" mindset [4 | 50]. [I certainly agree that models should be used if they are available. However, the citation of literature does appear to be gratuitous and it is unclear why the Authors believe that scientists working on H2L projects will benefit from knowing that a proprietary system for automated molecular design has been developed at GSK.] The first step is to generate sufficient data for a series to build confidence in [51] any models, which can then be exploited in the prioritization of compounds for synthesis that fit with aspirational profiles [My view is that it would be very unwise for H2L project teams to blindly use models without assessing how well the models predict project data although I consider the citation of [51] to be gratuitous been cited. Typically, H2L project teams use measured data to move their projects forward and generating data purely for the purpose of model evaluation is likely to be a distraction. One piece of advice that I will offer to H2L project teams is that they attempt to characterise responses of ADME predictors, such as aqueous solubility and permeability, to lipophilicity (likely to involve measurements for less potent compounds).] This ensures higher physicochemical quality [I consider “ensures” to be an exaggeration and I would argue that “physicochemical quality” is not something that can even be defined meaningfully or objectively (let alone quantified).], asks more pertinent questions and might reduce the total number of molecules made to get to the lead (Figure 11).

The Authors offer advice on how to ensure that optimisation is progressing in a satisfactory manner and how to know when to stop working on the series.

A Lead is not the perfect drug, but it gives reason to believe that the chemical series might be able to deliver one. An essential part of H2L (and later lead optimisation) is to ensure that the optimisation is progressing so that further investment is justified. Some essential questions can help achieve this: Does your series show dynamic SAR [The Authors need to say exactly what they mean by “dynamic SAR” if this is indeed the essential question that they assert it to be.] and achievable desired potencу? Is the preliminary ADMET data encouraging? [The Authors need to define “encouraging” if this is indeed an essential question.] Do you have evidence of in vivo effect (PK/PD) at appropriate exposures? [I would question the necessity of PK/PD studies before starting lead optimisation and there are potential ethical concerns about doing in vivo work using compounds that lack the potency required for meaningful PK/PD assessment.] Do the remaining challenges show dynamic SAR and confidence they can be optimized? [The term “remaining challenges” is vague and it is not clear how H2L scientists are supposed to assess “dynamic SAR” for remaining challenges that are not defined in terms of activity.] To answer this, it's critical to monitor the trajectory [As I pointed out previously in the post it is not generally feasible to objectively map optimization paths and I consider the use of “trajectory” to be inappropriate in this context, given that it usually applies to a well-defined path that is determined at launch (for example, a molecular dynamics trajectory).] of the optimisation: e.g. by monitoring relevant properties over time. [Typically, H2L teams assess how closely the best compounds match the lead target profile (LTP) as opposed to monitoring time dependencies of properties such as log D that have limited predictivity.] In the absence of progress, discontinuing further work on a scaffold or series may be justified, with reason to focus on other promising structures or recommend termination on a data-driven basis. [Generally, the decision to terminate projects and series will be made on the basis of failure to satisfy the LTP.]

It's been a long post and I'll say a big thank you for staying with me until the end. I wrote this post primarily for early-career scientists as well as for drug discovery scientists in academia and students (although I hope the feedback will also be helpful for the EFMC). One piece of advice that I will offer to all scientists regardless of the stage of their careers is to not switch off your critical thinking skills just because a study is presented as defining best practices or has been highly-cited. In particular, I urge all scientists to be extremely wary of studies in which the conclusions don't follow from the data and I'll share a recent blog post that illustrates the problem. All that said, however, confused thinking amongst drug discovery scientists is not high on the list of the problems facing many of the world's inhabitants right now and my wish for 2026 is for a kinder, gentler, fairer and more peaceful world.

Sunday, 2 August 2020

Why fragments?

Paramin panorama

Crystallographic fragment screens have been run recently against the main protease (at Diamond) and the Nsp3 macrodomain (at UCSF and Diamond) of SARS-Cov-2 and I thought that it might be of interest to take a closer look at why we screen fragments. Fragment-based lead discovery (FBLD) actually has origins in both crystallography [V1992 | A1996] and computational chemistry [M1991 | B1992 | E1994]. Measurement of affinity is important in fragment-to-lead work because it allows fragment-based structure-activity relationships to be established prior to structural elaboration. Affinity measurement is typically challenging when fragment binding has been detected using crystallography although affinity can be estimated by observation of the response of occupancy to concentration (the ∆G° value of −3.1 kcal/mol reported for binding of pyrazole to protein kinase B was derived in this manner).

Although fragment-based approaches to lead discovery are widely used, it is less clear why fragment-based lead discovery works as well as it appears to. While it has been stated that “fragment hits form high-quality interactions with the target”, the concept of interaction quality is not sufficiently well-defined to be useful in design. I ran a poll which asked about the strongest rationale for screening fragments. The 65 votes were distributed as follows: ‘high ligand efficiency’ (23.1%), ‘enthalpy-driven binding’ (16.9%), ‘low molecular complexity’ (26.2%) and ‘God loves fragments’ (33.8%). I did not vote.

The belief is that fragments are especially ligand-efficient has many adherents in the drug discovery field and it has been asserted that “fragment hits typically possess high ‘ligand efficiency’ (binding affinity per heavy atom) and so are highly suitable for optimization into clinical candidates with good drug-like properties”. The fundamental problem with ligand efficiency (LE), as conventionally calculated, is that perception of efficiency varies with the arbitrary concentration unit in which affinity is expressed (have you ever wondered why Kd , Ki or IC50 has to be expressed in mole/litre for calculation of LE?). This would appear to be an rather undesirable characteristic for a design metric and LE evangelists might consider trying to explain why it’s not a problem rather than dismissing it as a “limitation” of the metric or trying to shift the burden of proof is onto the skeptics to show that the evangelists’ choice of concentration unit for calculation of LE is not useful.

The problems associated with the arbitrary nature of the concentration unit used to express affinity were first identified in 2009 and further discussed in 2014 and 2019. Specifically, it was noted that LE has a nontrivial dependency on the concentration, C°, used to define the standard state. If you want to do solution thermodynamics with concentrations defined then you do need to specify a standard concentration. However, it is important to remember that the choice of standard concentration is necessarily arbitrary if the thermodynamic analysis is to be valid. If your conclusions change when you use a different definition of the standard state then you’ll no longer be doing thermodynamics and, as Pauli might have observed, you’ll not even be wrong. You probably don't know it, but when you use the LE metric, you’re making the sweeping assumption that all values of Kd, Ki and IC50 tend to a value of 1 M in the limit of zero molecular size. Recalling the conventional criticism of homeopathy, is there really a difference between a solute that is infinitely small and a solute that is infinitely dilute?

I think that’s enough flogging of inanimate equines for one blog post so let’s take a look at enthalpy-driven binding. My view of thermodynamic signature characterization in drug discovery is that it’s, in essence, a solution that’s desperately seeking a problem. In particular, there does not appear to be any physical basis for claims that the thermodynamic signature is a measure of interaction quality. In case you’re thinking that I’m an unrepentant Luddite, I will concede that thermodynamic signatures could prove useful for validating physics-based models of molecular recognition and in, in specific cases, they may point to differences in binding mode within congeneric series. I should also stress that the modern isothermal calorimeter is an engineering marvel and I'd always want this option for label-free, affinity measurement in any project.

It is common to see statements in the thermodynamic signature literature to the effect that binding is ‘enthalpy-driven’ or ‘entropy-driven’ although it was noted in 2009 (coincidentally, in the same article that highlighted the nontrivial dependence of LE on C°) that these terms are not particularly meaningful. The problems start when you make comparisons between the numerical values of ∆H (which is independent of C°) and T∆S° (which depends on C°). If I’d presented such a comparison in physics class at high school (I was taught by the Holy Ghost Fathers in Port of Spain), I would have been caned with a ferocity reserved for those who’d dozed off in catechism class. I’ll point you toward an article which asserts that, “when compared with many traditional druglike compounds, fragments bind more enthalpically to their protein targets”. I have a number of issues with this article although this is not the place for a comprehensive review (although I’ll probably pick it up in ‘The Nature of Lipophilic Efficiency’ when that gets written).

While I don’t believe that the authors have actually demonstrated that fragments bind more enthalpically than ligands of greater molecular size, I wouldn’t be surprised to discover that gains in affinity over the course of a fragment-to-lead (F2L) campaign had come more from entropy than enthalpy. First, the lost translation entropy (the component of ∆S° that endows it with its dependence on C°) is shared over greater number of intermolecular contacts for structurally-elaborated compounds and this article is relevant to the discussion. Second, I’d expect the entropy of any water molecule to increase when it is moved to bulk solvent from contact with molecular surface of ligand or target (regardless of polarity of the molecular surface at the point of contact). Nevertheless, this is something that you can test easily by examining the response of (∆H + T∆S°) to ∆G° (best to not to aggregate data for different targets and/or temperatures when analyzing isothermal titration calorimetry data in this manner). But even if F2L affinity gains were shown generally to come more from entropy than enthalpy, would that be a strong rationale for screening fragments?

This gets us onto molecular complexity and this article by Mike Hann and GSK colleagues should be considered essential reading for anybody thinking about selecting of compounds for screening. The Hann model is a conceptual framework for molecular complexity but it doesn’t provide much practical guidance as to how to measure complexity (this is not a criticism since the thought process should be more about frameworks and less about metrics). I don’t believe that it will prove possible to quantify molecular complexity in an objective manner that is useful for designing compound libraries (I will be delighted to be proven wrong on this point). The approach to handling molecular complexity that I’ve used in screening library design is to restrict extent of substitution (and other substructural features that can be considered to be associated with molecular complexity) and this is closer to ‘needle screening’ as described by Roche scientists in 2000 than to the Hann model.

Had I voted in the poll, ‘low molecular complexity’ would have got my vote. Here’s what I said in NoLE (it’s got an entire section on fragment-based design and a practical suggestion for redefining ligand efficiency so that perception does not change with C°):

"I would argue that the rationale for screening fragments against targets of interest is actually based on two conjectures. First, chemical space can be covered most effectively by fragments because compounds of low molecular complexity [18, 21, 22] allow TIP [target interaction potential] to be explored [70,71,72,73,74] more efficiently and accurately. Second, a fragment that has been observed to bind to a target may be a better starting point for design than a higher affinity ligand whose greater molecular complexity prevents it from presenting molecular recognition elements to the target in an optimal manner."

To be fair, those who advocate the use of LE and thermodynamic signatures in fragment-based design do not deny the importance of molecular complexity. Let’s assume for the sake of argument that interaction quality can actually be defined and is quantified by the LE value and/or the thermodynamic signature for binding of compound to target. While these are massive assumptions, LE values and thermodynamic signatures are still effects rather than causes.

The last option for poll was ‘God loves fragments’ and more respondents (33.8%) voted for this than any of the first three options. I would interpret a vote for ‘God loves fragments’ in three ways. First, the respondent doesn’t consider any one of the first three options to be a stronger rationale for screening fragments than the other two. Second, the respondent doesn’t consider any of the first three options to be a valid rationale for screening fragments. Third, the respondent considers fragment-based approaches to have been over-sold.

This is a good place to wrap up. While I remain an enthusiast for fragment-based approaches to lead discovery, I do also believe that they have been somewhat oversold. The sensitivity of LE evangelists to criticism of their metric may stem from the use of LE to sell fragment-based methods to venture capitalists and, internally, to skeptical management. A shared (and serious) deficiency in the conventional ways in which LE and thermodynamic signature are quantified is that perception changes when the arbitrary concentration, C°, that defines the standard state is changed. While there are ways in which this deficiency can be addressed for analysis, it is important that the deficiency be acknowledged if we are to move forward. Drug design is difficult and if we, as drug designers, embrace shaky science and flawed data analysis then those who fund our activities may conclude that the difficulties that we face are of our own making.

Friday, 30 November 2018

Ligand efficiency and fragment-to-lead optimizations

The third annual survey (F2L2017) of fragment-to-lead (F2L) optimizations was published last week. Given that it was the second survey (F2L2016) in this series, that prompted me to write 'The Nature of Ligand Efficiency' (NoLE), I thought that some comments would be in order. F2L2017 presents analysis of data that had been aggregated from all three surveys and I'll be focusing on the aspects of this analysis that relate to ligand efficiency (LE).

As noted in NoLE, perception of efficiency changes when affinity is expressed in different concentration units and I have argued that this is an undesirable feature for a quantity that is widely touted as useful for design. At very least, it does place a burden of proof on those who advocate the use of LE in design to either show that the change in perception of efficiency with concentration unit is not a problem or to justify their choice of the 1 M concentration unit. One difficulty that LE advocates face is that the nontrivial dependency of LE on the concentration unit only came to light a few years after LE was introduced as "a useful metric for lead selection" and, even now, some LE advocates appear to be in a state of denial. Put more bluntly, you weren't even aware that you were choosing the 1 M concentration unit when you started telling medicinal chemists that they should be using LE to do their jobs but you still want us to believe that you made the correct choice?

I'm assuming that the authors of F2L2017 would all claim familiarity with the fundamentals of physical chemistry and biophysics while some of the authors may even consider themselves to be experts in these areas. I'll put the following question to each of the authors of F2L2017: what would your reaction be to analysis showing that the space group for a crystal structure changed if the unit cell parameters were expressed using different units? I can also put things a bit more coarsely by noting that to examine the effect on perception of changing a unit is, when applicable, a most efficacious bullshit detector.

The analysis in F2L2017 that I'll focus on is the the comparison between fragment hits and leads. As I showed in NoLE, it is meaningless to compare LE values because LE has a nontrivial dependency on the concentration unit used to express affinity. LE advocates can of course declare themselves to be Experts (or even Thought Leaders) and invoke morality in support of their choice of the 1 M concentration unit. However, this is a risky tactic because physical science can't accommodate 'privileged' units and an insistence that quantities have to be expressed in specific units might be taken as evidence that one is not actually an Expert (at least not in physical science).

So let's take a look at what F2L2017 has to say about LE in the context of F2L optimizations.

"The distributions for fragment and lead LE have also remained reasonably constant. On average there is no significant change in LE between fragment and lead (ΔLE = 0.004, p ≈ 0.8). Figure 5A shows the distribution of ΔLE, which is approximately centered around zero, although interestingly there are more examples where LE increases from fragment to lead (40) than where a decrease is seen (25). Some caution is warranted when interpreting these data, as our minimum criterion for 100-fold potency improvement may have introduced some selection bias. Nevertheless, there is no clear evidence in this data set that LE changes systematically during fragment optimization. Although the average change in LE from fragment to lead is small, Figure 5B shows that the correlation between fragment and lead LE is modest (R2 = 0.22), with a mean absolute difference between fragment and lead LE of 0.08."

This might be a good point at which to remind the authors of F2L2017 about some of the more extravagant claims that have been made for LE. It has been asserted that “fragment hits typically possess high ‘ligand efficiency’ (binding affinity per heavy atom) and so are highly suitable for optimization into clinical candidates with good drug-like properties”. It has also been claimed that "ligand efficiency validated fragment-based design". However, the more important point is that it is completely meaningless to compare values of LE of hits and leads because you will come to different conclusions if you express affinity using a different concentration unit (see Table 2 in NoLE). It is also worth noting that expressing affinity in units of 1 M introduces selection bias just as does the requirement for 100-fold potency improvement.

Had I been reviewing F2L2017, I'd have suggested that the authors might think a bit more carefully about exactly why they are analyzing differences between LE values for fragments and leads. A perspective on fragment library design (reviewed in this post) correctly stated that a general objective of optimization projects is “ensuring that any additional molecular weight and lipophilicity also produces an acceptable increase in affinity". If you're thinking along these lines then scaling the F2L potency increase by the corresponding increase in molecular size makes a lot more sense than comparing LE for the fragments and leads. This quantifies how efficiently (in terms of increased molecular size) the potency gains for the F2L project have been achieved. This is not a new idea and I'll direct readers toward a 2006 study in which it was noted that a tenfold increase in affinity corresponded to a mean increase in molecular weight of 64 Da (standard deviation = 18 Da) for 73 compound pairs from FBLD projects. This is how group efficiency (GE) works and I draw the attention of the two F2L2017 authors from Astex to a perceptive statement made by their colleagues that GE is “a more sensitive metric to define the quality of an added group than a comparison of the LE of the parent and newly formed compounds”.

The distinction between a difference in LE and a difference in affinity that has been scaled by a difference in molecular size becomes a whole lot clearer if you examine the relevant equations. Equation (1) defines the F2L LE difference and first thing that you'll notice is that is that it is algebraically more complex than equation (2). This is relevant because LE advocates often tout the simplicity of the LE metric. However, the more significant difference between the two is that the concentration that defines the standard state is present in equation (1) but absent in equation (2). This means that you get the same answer when you scale affinity difference by the corresponding molecular size difference regardless of the units in which you express affinity.

So let's see how things look if you're prepared to think beyond LE when assessing F2L optimizations. Here's a figure from NoLE in which I've plotted the change in affinity against the change in number of non-hydrogen atoms for the F2L optimizations surveyed in F2L2016. The molecular size efficiency for each optimization can be calculated by dividing the change in affinity by the change in in number of non-hydrogen atoms. I've drawn lines corresponding to minimum and maximum values of molecular size efficiency and have also shown the quartiles.

So now it's time to wrap things up. A physical quantity that is expressed in a different unit is still the same physical quantity and I presume that all the authors of F2L2017 would have been aware of this while they were still undergraduates. LE was described as thermodynamically indefensible in comments on Derek's post on NoLE and choosing to defend an indefensible position usually ends in tears (just as it did for the French at Dien Bien Phu in 1954). The dilemma facing those who seek to lead opinion in FBDD is that to embrace the view that the 1 M concentration unit is somehow privileged requires that they abandon fundamental physicochemical principles that they would have learned as undergraduates.