Leads have line of sight to a development candidate and bring an understanding of what priorities Lead Optimisation should address.
The screening phase is followed by the hit-to-lead phase and it can be helpful to draw an analogy between drug discovery and what is called football outside the USA. It’s not generally possible to design a drug from screening output alone and to attempt to do so would be the equivalent of taking a shot at goal from the centre spot. Just as the midfielders try move the ball closer to the opposition goal, the hit-to-lead team use the screening hits as starting points for design of higher affinity compounds. The main objective in the hit-to-lead phase to generate information that can be used for design and mapping structure-activity relationships for the more interesting hits is a common activity in hit-to-lead work.
The wide acceptance of Ro5 provided other researchers with an incentive to publish analyses of their own data and those who have followed the drug discovery literature over the last decade or so will have become aware of a publication genre that can be described as ‘retrospective data analysis of large proprietary data sets’ or, more succinctly, as ‘Ro5 envy’.
Drug designers should not automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant to the specific drug design projects on which they are working.
Figure 1. Illustration of the multi-objective characterisation necessary in the journey from a hit to a drug. All these necessary characteristics, described by illustrative principal components, are influenced by the physicochemical properties of the molecules.
Potency optimisation alone is not a viable strategy towards the discovery of efficacious and safe drugs, or even high-quality leads. Concurrent optimisation of the physicochemical properties of a molecule is the most important facet of drug discovery, as these properties influence its behaviours, disposition and efficacy [12a | 12b]. [While I certainly agree that there is a lot more to drug design than maximisation of potency I would argue that controlling exposure is a more important objective than optimization of physicochemical properties. I don't consider either reference as evidence that "concurrent optimisation of the physicochemical properties of a molecule is the most important facet of drug discovery" and it is not accurate to describe metabolic stability, active efflux and affinity for anti-targets as "physicochemical properties". I think the Authors need to say more about which physicochemical properties they recommend to be optimized and be clearer about exactly what constitutes optimization. Lipophilicity alone is not usefully predictive of properties such as bioavailability, distribution and clearance that determine the in vivo behaviours of drugs.] Together these outcomes define the quality of the molecule, indicative of its chances of success in the clinic, as evidenced in numerous studies [13a | 13b]. [Neither of these articles appears to provide convincing evidence of a causal relationship between “the quality of a molecule” and probability of success in the clinic. Much of the 'analysis' in [13a] consists of plots of median values without any indication of the spreads in the corresponding distributions. As explained in KM2013 presenting data in this manner exaggerates trends and I consider it unwise to base decisions on data that have been presented in this manner. Quite aside from from the issue of hidden variation I do not consider the relationship between promiscuity and median cLogP reported (Figure 3a) in [13a] to be indicative of probability of success in the clinic, given that the criterion for 'activity' ( > 30% inhibition at 10 µM) is far too permissive to be physiologically relevant (this is a common issue in the promiscuity literature).]
While the optimal lipophilicity range has been suggested as a log D7.4 between 1 and 3, [15] this is highly dependent on the chemical series. [The focus of the analysis was permeability and the range was actually defined in terms of AZlogD (calculated using proprietary in-house software) as opposed to log D measured at 7.4. The correlation between the logarithm of the A to B permeability and AZlogD is actually very weak (r2 = 0.16) which would imply a high degree of uncertainty in threshold values used to specify the optimal lipophilicity range. While I remain sceptical about the feasibility of meaningfully defining optimal property ranges the assertion that the proposed range in AZlogD of 1 to 3 “is highly dependent on the chemical series” is pure speculation and is not based on data.] Best practice would be to generate data for a diverse set of compounds in a series, if measuring it for all analogues is not possible, and determine the lipophilicity range that leads to the most balanced properties and potency [3 | 16]. [It is not clear what the Authors mean by “most balanced properties and potency” nor is it clear how one is actually supposed to use lipophilicity measurements to objectively “determine the lipophilicity range that leads to the most balanced properties and potency”. My view is that to demonstrate "balanced properties and potency" would requite measurements of properties such as aqueous solubility and permeability that are more predictive than lipophilicity of exposure in vivo. I do not consider either [3] or [16] to support the assertions being made by the Authors.] Lipophilicity and pKa prediction models can then guide further designs and synthesis of analogues along the optimisation pathway (Figure 3 [17]). but measurements are advised, particularly by chromatographic methods, such as Chrom log D7.4, in [18] contemporary practice. [In general, it is very difficult to convincingly demonstrate that one measure of lipophilicity is superior to another. Chromatographic measurement of log D is faster than the shake flask method used traditionally but it is unclear as to which solvent system the measurement corresponds. Furthermore, the high surface area to volume area of the stationary phase means that an ionized species can interact to a significant extent with the non-polar stationary phase while keeping the ionized group in contact with the polar stationary phase and one should anticipate that the contribution of ionization to log D values might be lower in magnitude than for a shake flask measurement.]
As noted earlier in the post I consider it helpful to plot (as is done in Figure 3 which also serves as the graphical abstract) potency against lipophilicity with reference lines corresponding to different LLE (LipE) values (see R2009 which really should have been cited) to be a good way for H2L project teams to visualize potency measurements for their project compounds. That said, I consider view of the discovery process implied by Figure 3 to be neither accurate nor of any practical value for scientists working on H2L projects. It is relatively easy to define optimization of potency and measurements in an vitro assay are typically relevant to target engagement in vivo (uncertainty in the concentration of the drug in the target compartment, and of the species with which it competes, is likely to be the bigger issue when trying to understand why in vitro potency fails to translate to beneficial effects in vivo).
However, there is quite a bit more to optimization of properties such as permeability, aqueous solubility, metabolic stability and pharmacological promiscuity that are believed to be predictive of ADME and toxicity, and I consider a view that determining "the lipophilicity range that leads to the most balanced properties and potency" constitutes optimization to be hopelessly naive. The main challenge in H2L work (and in lead optimization) is to identify compounds for which potency and properties related to ADME and toxicity are all acceptable.
Figure 3. There are numerous routes to climb a mountain, as there are to discover a drug, but a measured approach to lipophilicity will guide an optimal path, [The Authors need to articulate what they mean by “a measured approach to lipophilicity” (which does come across as arm-waving) and provide evidence to support their claim that it “will guide an optimal path”.] where the outcome is usually driven by a balance of activity and lipophilicity [This appears to be a statement of belief and the Authors do need to provide evidence to support their claim. The Authors also need to say more about how the “balance of activity and lipophilicity” can be objectively assessed.] (The parallel lines represent LLE, i.e. plC50 - log P). [This way of visualizing data was introduced in the R2009 study which, in my view, should have been cited.]
Thus the Distribution Coefficient, (log D at a given pH) is a highly influential physical property governing ADMET profiles [20a | 20b | 20c] such as on- and off-target potency, solubility, permeability, metabolism and plasma protein binding (Figure 4) [14b]. [I recommend that the term ‘ADMET’ not be used in drug discovery because ADME (Absorption, Distribution, Metabolism, and Excretion) and T (Toxicity) are completely different issues that need to be addressed differently in design. I would argue that the ADME profile of a drug is actually defined by its in vivo characteristics such as fraction absorbed (which may vary with dose and formulation), volume of distribution and clearance (the Authors appear to be confusing ADME with in vitro predictors of ADME) and I would also argue that toxicity is an in vivo phenomenon. In order to support the claim that log D “is a highly influential physical property governing ADMET profiles” it would be necessary to show that log D is usefully predictive of what happens to drugs in vivo. My view is that the cited literature does not support the claim that the claim that log D “is a highly influential physical property governing ADMET profiles” given that [20a] does not even mention log D and neither [20b] nor [20c] provides any evidence that log D is usefully predictive of in vivo behaviour of drugs.]
Figure 4. The impact of increasing lipophilicity on various developability outcomes [14b] [It is unclear as to whether lipophilicity is defined for this graphic in terms of log P or log D. It would be necessary to show more than just the ‘sense’ of trends for the term “impact” to be appropriate in this context. I do not consider the use of the term “developability outcomes” to be accurate.]
Rational [this use of "rational" is tautological] reasons for poor solubility were succinctly described by Bergstrom, who coined "Brick Dust and Greaseballs" as two limiting phenomena in drug discovery [22] which are in line with the empirical findings that led to General Solubility Equation [23] (Figure 5). [I don’t consider the General Solubility Equation to have any relevance to H2L work because it has not been shown to be usefully predictive of aqueous solubility for compounds of interest to medicinal chemists and the inclusion of Figure 5, which merely shows how predicted solubility values map on to an arbitrary categorisation scheme, appears to be gratuitous.] Succinctly, three factors influence solubility: lipophilicity, solid state interactions and ionisation. [It is solvation energy as opposed to lipophilicity that influences solubility and wet octanol is a poor model for the gas phase.] Determining which are the strongest drivers of low solubility will guide the optimisation (Figure 6). Using the analysis in Figure 5 the Solubility Forecast Index emerged, using the principle that an aromatic ring is detrimental to solubility, roughly equivalent to an extra log unit of lipophilicity for each aromatic ring (Thus SFI = clog D7.4 + #Ar) [24]. [I consider the use of the term “principle” in this context to to be inaccurate given that that the basis for SFI is subjective interpretation of a graphic generated from proprietary aqueous solubility data and I direct readers to the criticism of SFI in KM2023.] Minimising aromatic ring count is an important and statistically significant metric to consider [25] [The importance of minimizing aromatic ring count is debatable and it is meaningless to describe metrics as “statistically significant”.] - consistent with the "escape from flatland" concept [26] that focusses on increasing the sp³ (versus sp²) ratio in molecules, [The focus in the “escape from flatland” study is actually on the fraction of carbon atoms that are sp3 (Fsp3) and not on “the sp³ (versus sp²) ratio”.] even though no significant trends are apparent in detailed analyses of sp³ fractions [27]. [The “analyses of sp³ fractions” in [27] consist of comparisons of drug - target medians for the periods 1939-1989, 1990-2009 and 2010-2020 and all appear to be statistically significant (although I don't consider these analyses to have any relevance to H2L work).]
An important factor in hit selection is to prioritise compounds with higher ligand efficiency. Ligand efficiency, defined as activity [LE is actually defined in terms of Gibbs free energy of binding and not activity.] per heavy atom (LE=1.37* pKi/Heavy Atom Count, Figure 7a), is commonly considered in discovery programmes as a quality metric [33]. [LE (Equation 3) is actually defined as the Gibbs free energy of binding, ΔG° (Equation 2), divided by the number of non-hydrogen atoms, NnH (this is identical to heavy atom count although I consider the term to be less confusing), but the quantity is physically (and thermodynamically) meaningless because perception of efficiency varies with the arbitrary concentration, C°, that defines the standard state (see Table 1 in NoLE). Using a standard concentration enables us to calculate changes in free energy that result from changes in composition and, while the convention of using C° = 1 M when reporting ΔG° values. is certainly useful, it would be no less (or more) correct to report ΔG° values for C° = 1 µM. Put another way the widely held belief that 1 M is a 'privileged' standard concentration is thermodynamic nonsense (Equation 2 shows you how to interconvert ΔG° values between different standard concentrations. Given the serious deficiencies of LE as a drug design metric, I suggest modelling the response and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the 'Alternatives to ligand efficiency for normalization of affinity' section of NoLE). There are two errors in the expression that the Authors have used for LE (the molar energy units are missing and the expression is written in terms of Ki rather than KD). The factor of 1.37 in the expression for LE comes from the conversion of affinity (or potency) to ΔG° at a temperature of 300 K, as recommended in [35], although biochemical assays are typically are typically run at human body temperature (310 K). My view is that it is pointless to include the factor of 1.37 given that this entails dropping the molar energy units and using a temperature other than that at which the assay was run. Dropping the factor of 1.37 would also bring LE into line with LLE (LipE).] Various analyses suggest that, on average, this value barely change over the course of an optimisation process [20b | 27 | 34a | 34b] - so it is important to consider maintenance of any figure during any early SAR studies. [I disagree with this recommendation. These analyses are completely meaningless because the variation of LE over the course of an optimization itself varies with the concentration unit in which affinity (or potency) is expressed (Table 1 of NoLE illustrates this for three ligands of that differ in molecular size and potency). In [34a] the start and finish values values of LE were averaged over the different optimizations without showing variance and it is therefore not accurate to state that the study supports the assertion that LE values "barely change over the course of an optimisation process".] Lipophilic Ligand Efficiency (activity minus lipophilicity typically pKi -log P, Figure 7b), which is widely recognised as the key principle in successful drug optimisation, comes into play both for hit prioritization and optimisation. [LLE is a simple mathematical expression and I don’t consider it accurate to describe it as a “principle” let alone “the key principle in successful drug optimisation”. LLE can be thought of as quantifying the energetic cost of transferring a ligand from octanol to its target binding site although this interpretation is only valid when the ligand is predominantly neutral at physiological pH and binds in its neutral form. LLE is just one of a number of ways to normalize potency with respect to lipophilicity and I don't think that anybody has actually demonstrated that (pIC50 – log P) is any better (or worse) as a drug design principle than pIC50 – 0.9 × log P. When drug discovery scientists report that they have used LLE it often means that they have plotted their project data in a similar manner to Figure 3 as opposed to staring at a table of LLE values for their compounds. As an alternative to LLE (LipE) for normalization of affinity (or potency) with respect to lipophilicity I suggest modelling the response and using the residuals to quantify the extent that individual potency measurements beat (or are beaten by) the trend in the data (the approach is outlined in the 'Alternatives to ligand efficiency for normalization of affinity' section of NoLE).] Improving this value reflects producing potent compounds without adding excessive lipophilicity. Taken together, it has been shown that for any given target, the drugs mostly lie towards the leading "nose" [?] where LE and LLE are both towards higher values [20b | 35]. [This perhaps not the penetrating an insight that the Authors consider it to be, given that drugs are usually more potent than the leads and hits from which they have been derived.] However, setting aspirational targets for either metric is unwise, as analysis of outcomes indicates that the values are target dependant [20b]. [I consider target dependency to be a complete red herring in this context and a more important issue is that you can’t compensate for inadequate potency by reducing molecular size or lipophilicity.] Focusing on increasing LLE to the maximum range possible and prioritizing series with higher average values is the recommended strategy [27 | 36]. [It is not clear what is meant by “increasing LLE to the maximum range possible” and I consider it very poor advice indeed to recommend “prioritizing series with higher average values” (my view is that you actually need to be comparing the compounds from different series that have a realistic chance of matching the desired lead profile. The Authors of Q2025 appear to be misrepresenting [36] given that the study does not actually recommend “prioritizing series with higher average values”. This blog post on [27] might be relevant.]
One can summarize this section with a simple but critical best practice: potency and properties (physicochemical and ADMET) have to be optimized in parallel (Figure 8) [37] to get to quality leads and later drug candidates with higher chances of clinical success. Whilst seemingly trivial, this proposition is rendered challenging by an "addiction to potency" and a constant reminder of this critical concept remains useful for medicinal chemists [38]. [My view is that many medicinal chemists had moved on from the addiction to potency when the molecular obesity article was published a decade and a half ago and I would question the article's relevance to contemporary H2L practice. The threshold values that define the GSK 4/400 rule actually come from an arbitrary scheme used to categorize the proprietary data analyzed in the G2008 study as opposed being derived from objective analysis of the data. The study reproduces the promiscuity analysis from [13a] which I criticised earlier in this post for exaggerating the strength of the trend and using an excessively permissive threshold for ‘activity’.] With poor properties, even "good ligands" may not fully answer pharmacological questions [39a | 39b]. [These two articles focus on chemical probes and I don’t consider either article have any relevance to H2L work. Chemical probes need to be highly selective (more so than drugs) and permeable although solubility requirements are likely to be less stringent when using chemical probes to study intracellular phenomena than in H2L work and one generally does not need to worry about achieving oral bioavailability.]
I agree that mapping SARs for structural series of interest is an important aspect of H2L work and activity cliffs (small modifications in structure resulting in large changes in activity) are of particular interest given the potential for to beating trends and achieving greater selectivity. Instances of decreased lipophilicity resulting in increased potency (or at least minimal loss of potency) should also be of significant interest to H2L teams. When mapping SARs it is important that structural transformations should change a single pharmacophore feature at a time and one should always consider potential ‘collateral effects’, such as perturbed conformational preferences, that might confound the analysis. Some of the structural transformations shown in Figure 10 change more than one pharmacophore feature at a time which makes it impossible to determine which pharmacophore feature is required for activity.
Figure 10. Conceptual example of iterative SAR [the meaning of the term “iterative SAR” is unclear] to determine the pharmacophore. As each change may affect binding interactions, conformation and ionization state; complementary structural modification will be needed to understand the change in potency and determine the pharmacophore
Is Nitrogen needed (e.g. HBA)? [In addition to eliminating the quinoline N hydrogen bond acceptor this structural transformation eliminates a potential pharmacophore feature (the amide carbonyl oxygen can function as a hydrogen bond acceptor) while creating a cationic centre which will incur a significant desolvation penalty.]
Is NH needed? [This structural transformation eliminates the amide NH but it also is unlikely to address the question of whether the NH is needed because the amide carbonyl has also been eliminated.]
Is carbonyl needed? [The elimination of the amide carbonyl oxygen (hydrogen bond acceptor) creates a cationic centre which will incur a desolvation penalty.]
As a last proposition, [49a | 49b] we suggest that the progress in computational physicochemical and ADMET property predictions represents an opportunity to accelerate the optimisation of molecules with a "predict-first" mindset [4 | 50]. [I certainly agree that models should be used if they are available. However, the citation of literature does appear to be gratuitous and it is unclear why the Authors believe that scientists working on H2L projects will benefit from knowing that a proprietary system for automated molecular design has been developed at GSK.] The first step is to generate sufficient data for a series to build confidence in [51] any models, which can then be exploited in the prioritization of compounds for synthesis that fit with aspirational profiles [My view is that it would be very unwise for H2L project teams to blindly use models without assessing how well the models predict project data and the citation of [51] appears to be gratuitous been cited. Typically, H2L project teams use measured data to move their projects forward and generating data purely for the purpose of model evaluation is likely to be a distraction. One piece of advice that I will offer to H2L project teams is that they attempt to characterise responses of ADME predictors, such as aqueous solubility and permeability, to lipophilicity (likely to involve measurements for less potent compounds).] This ensures higher physicochemical quality [I consider “ensures” to be an exaggeration and I would argue that “physicochemical quality” is that not something that can even be defined meaningfully or objectively (let alone quantified).], asks more pertinent questions and might reduce the total number of molecules made to get to the lead (Figure 11).
It's been a long post and I'll say a big thank you for staying with me until the end. I wrote this post primarily for younger scientists and one piece of advice that I will offer to them is to not switch off their critical thinking skills just because a study has been presented as defining best practices or is highly-cited. The world right now is not a nice place for many of its inhabitants and my wish for 2026 is a kinder, gentler and fairer world.


No comments:
Post a Comment