Tuesday, 30 July 2024

A Nobel for property-based drug design?

[This post was updated on 04-Aug-2024. I thank Tim Ritchie (see RM2009 | RM2014) for bringing YG2003 (Prediction of Aqueous Solubility of Organic Compounds by Topological Descriptors) to my attention.]

"The problems of ADME are precisely those that determine success or failure of a drug in vivo. In vitro data can give a clearer picture of the receptor characteristics, but knowledge and control of ADME are also vital. A common trap in binding studies is that binding generally increases with lipophilicity, so that one may obtain extremely potent binding that is totally unattainable in vivo."

SH Unger (1987) Computer-Aided Drug Design in the Year 2000. 
Drug Information Journal 21:267-275 DOI
******************************************

In this post I’ll be reviewing an Editorial (Property-Based Drug Design Merits a Nobel Prize) that was recently published in J Med Chem. For me, the Editorial raises questions about the critical thinking skills of its authors and of the judgement of the J Med Chem Editors (I’m guessing that some of the courteous and cultured members of the Nobel Prize committee might regard it to be somewhat pushy, and possibly even uncouth, for journals to be publishing nominations for Nobel Prizes as editorials). My advice to anybody nominating individuals for a Nobel Prize is to be aware of an observation, usually attributed to Jocelyn Bell Burnett, that it’s better that people ask why you didn’t win a Nobel Prize than why you did. Where applicable, I've used the the same reference numbers that were used in the Editorial and I’ll start by reproducing the Nobel Prize proposal (as is usual in posts at Molecular Design, I’ve inserted some comments, italicized in red and enclosed in square brackets, into the quoted text):
We propose that a Nobel Prize in Physiology or Medicine should be awarded for property-based drug design, with Christopher A. Lipinski, Paul D. Leeson, and Frank Lovering as the proposed recipients for their development of “important principles for drug design” [I would describe what the proposed Nobel laureates have introduced as a rule, a metric and a molecular descriptor rather than principles.], principles that have contributed to the development of numerous approved drugs. [The authors do need to provide convincing evidence to support what appear to be some wildly extravagant claims. Specifically, the authors need to demonstrate that the rule, metric and molecular descriptor (which they describe as “principles”) were actually critical to the decision-making in projects that led to the development of numerous drugs.] While drug design previously focused primarily on optimizing potency, they introduced a more holistic approach based on the consideration of how fundamental molecular and physicochemical properties affect pharmaceutical, pharmacodynamic, pharmacokinetic, and safety properties. [My view is that none of proposed Nobel laureates even demonstrated a single convincing link between molecular and physicochemical properties, and pharmaceutical, pharmacodynamic, pharmacokinetic, and safety properties.] The development of the Rof5 by Christopher A. Lipinski in 1997 introduced a new principle for how molecular and physicochemical properties affect oral bioavailability. The development of LipE by Paul D. Leeson in 2007 introduced a new principle for how physicochemical properties impact potency, selectivity, and safety. Finally, the development of Fsp3 by Frank Lovering in 2009 introduced a new principle for how molecular shape affects pharmaceutical properties and developability.

Before examining the contributions of the three nominated individuals it's worth saying something about the objectives of drug design. First, a drug needs to be highly active against its target(s). Second, activity against anti-targets should be very low (ideally too low to even be measured). Third, as I note in 34, the exposure (concentration at the site of action) of the drug needs to be controllable (one challenge in drug design is that intracellular drug concentration can’t generally be measured in vivo and I recommend that all drug discovery scientists read SR2019). I see controlling exposure as the primary focus of property-based design and one fundamental challenge is that structural modifications that lead to increased engagement potential for the therapeutic target(s) frequently result in reduced controllability of exposure as well as increased engagement potential for anti-targets. I’ve tried to capture these points in the graphic shown below.


It's generally accepted that excessive lipophilicity and molecular size are risk factors in drug design and the “compound quality” (CQ) literature abounds with fire-and-brimstone sermons on the evils of "molecular obesity" (see H2011). Nevertheless, the relationships between these descriptors and properties such as binding affinity for anti-targets, permeability, aqueous solubility and metabolic lability are generally not quite as strong as is commonly believed (or claimed). When using trends in data to inform design it’s really important to know how strong the trends are because this tells you how much weight to give to the trends when making decisions. It’s not unknown in CQ studies for trends in data to be made to appear to be stronger than they actually are which endows the CQ field with what I’ll politely call a “whiff of the pasture” (the term “correlation inflation” has been used; see KM2013). Transformation of continuous data (IC50 values) to categorical data (high | medium | low) prior to analysis should trigger a deafening cacophony of alarm bells as should any averaging of groups of continuous data values without showing the spread in the data values. Some examples of studies in which I consider the strengths of trends to have been exaggerated include 29, 35, HMO2016 and HY2010.

I think that one thing that everybody who actually works (or has worked) on drug discovery projects agrees on is that drug discovery is really difficult. My view is that, by focusing on Rof5, LipE and Fsp3, the Editorial actually trivializes the challenges faced by drug discovery scientists. Most drug design (as opposed to ligand design) takes place during lead optimization and lead optimization teams are typically addressing specific problems (for example, structural changes that result in increased potency also result in reduced aqueous solubility).  Lead optimization teams typically work with a lot of measured data (a significant component of drug design is efficient generation of data to enable decision-making) and a weak correlation between logP and aqueous solubility reported in the literature would be of no practical relevance when the lead optimization team is using aqueous solubility measurements for compounds in the structural series that they’re optimizing. It is common (see M2001 | G2008) for the simplicity of rules, guidelines and metrics to be touted and we noted in KM2013 that:   

Given that drug discovery would appear to be anything but simple, the simplicity of a drug-likeness model could actually be taken as evidence for its irrelevance to drug discovery.

Guidelines for successful drug discovery are often presented in terms of something good (or bad) being more likely to happen when the value of a calculated property such as Fsp3 exceeds a threshold. When using guidelines like these be aware that it’s actually very difficult to set these threshold values objectively and that the guidelines would have been stated in an identical manner had different threshold values been chosen to specify them. One difficulty with using guidelines like these is that the creators of the guidelines don’t usually say what they mean by “more likely” (millions of people book flights knowing that one is “more likely” to die in a plane crash if one takes a flight than if one doesn’t take a flight). A number of published guidelines (some of which have been referenced in the Editorial) claim that compounds that comply with the guidelines are more likely to be developable. However, giving weight to these claims would require that developability be defined in an objective manner that enables compounds with arbitrary molecular structures and differing biological activity to be meaningfully compared.   

I’ll examine the contributions of the three proposed laureates for the Nobel Prize in Physiology or Medicine following the order in the Editorial. Let's start with the first:
  
The development of the Rof5 by Christopher A. Lipinski in 1997 introduced a new principle for how molecular and physicochemical properties affect oral bioavailability. [As a reviewer of the manuscript I would have pressed the authors to explicitly state the new principle that their first nominee for the Nobel Prize for Physiology or Medicine had introduced 1997.]

My view is that the publication of the Rof5 (22) has certainly proven to be highly influential in that it made many drug discovery scientists aware of the need to take account of physicochemical properties, in particular lipophilicity, in drug design. What is less well-known, but possibly more important in my view, is that publication of the Rof5 sent a clear message to Pharma/Biotech management that high-throughput screening wasn’t going to be the panacea that many believed that it would be. However, I don't see the Rof5 as quite the epiphany that the authors of the Editorial would have us believe it to be. The quote with which I started this post was taken from an article that had been published ten years before 22 and the inverse nature of the relationship between aqueous solubility and lipophilicity was being discussed in the scientific literature (see YV1980) more than forty years ago. The NC1996 study is also worthy of mention because it was published more than a year before 22 and it makes the important point that optimal logP values are likely to vary with chemotype ("each congeneric series for a drug backbone usually demonstrates its own optimal log P").       

Questions can be raised about the data analysis presented in support of the Rof5 and readers may find it helpful to take a look at the S2019 study as well as my comments on the Rof5 in HBD3 and in this post. I would argue that the Rof5 does not have any practical value as a drug design tool and I would challenge the assertion made in the Editorial that the publication of 22 demonstrated how “molecular and physicochemical properties affect oral bioavailability”. One aspect of the analysis presented (22) in support of the Rof5 that isn't always fully appreciated is that the compounds for which the descriptors are calculated were all treated as having equivalent oral bioavailability (compounds were selected for the analysis on the basis of having been taken into phase 2 clinical trials at some point before the Rof5 had been published in 1997). This is one reason that it’s not credible to assert that the analysis demonstrates that these molecular and physicochemical properties are linked to bioavailability (it must be stressed that, like many, I do actually believe that excessive lipophilicity and molecular size are risk factors in drug design). I make the following point in a blog post (I’ve modified the original text very slightly for consistency with the Editorial):

The Rof5 is stated in terms of likelihood of poor absorption or permeation although no measured oral absorption or permeability data are given in 22 and the Rof5 should therefore be regarded as a statement of belief. I realise that to make such an assertion runs the risk of an appointment with the auto-da-fé and I stress that had the Rof5 been stated in terms of physicochemical and molecular property distributions I would not have made the assertion.

To see what I was getting at let’s take a look at how the Rof5 was stated in 22 (“The ‘rule of 5’ states that: poor absorption or permeation are more likely when…”). However, the analysis presented in support of the Rof5 was of the distribution of compounds in chemical space defined by molecular weight, logP and numbers of hydrogen bond donors and acceptors with no account being taken of variation in either absorption or permeation for the compounds. Analysis like this can be informative but you need to demonstrate that the chemical space is actually relevant to the phenomena of interest. One way that you can demonstrate that a chemical space is relevant is to build predictive models for the phenomena of interest using only the dimensions of the chemical space as descriptors. Alternatively you might observe meaningful differences between the distributions in the chemical space for compounds that have respectively passed and failed at at a particular stage in clinical development.  

So that’s all that I’ll be saying about Rof5 and it’s time to take a look at the contributions of the second proposed Nobel Laureate:

The development of LipE by Paul D. Leeson in 2007 introduced a new principle for how physicochemical properties impact potency, selectivity, and safety. [As a reviewer of the manuscript I would have pressed the authors to explicitly state the new principle that their second nominee for the Nobel Prize for Physiology or Medicine had introduced 2007.]

I'll start by saying that LipE is a simple mathematical formula and I suggest that one shouldn't be confusing simple mathematical formulae with principles when nominating people for Nobel Prizes. There are, however, other errors and these are not the kind of errors that you can afford to make when nominating people for Nobel Prizes. First, the term used in 29 is actually “ligand-lipophilicity efficiency” (LLE) although this appears to have mutated to “lipophilic ligand efficiency” (also LLE) by 2014 (see H2014). The term “LipE” was actually introduced by Pfizer scientists (see R2009) and it is significant that the more recent J2018 article defines LipE in terms of logD rather than logP (doing so means that you can make compounds more efficient simply by increasing extent of ionization and, as a drug design tactic, this is likely to end about as well as things did for the Sixth Army at Stalingrad).

The second (and more serious from the perspective of a Nobel nomination) error is that the metric had already been discussed, although not named, in the literature when 29 was published (I’m guessing that a suggestion that naming a metric merits a Nobel Prize for Physiology or Medicine might cause some members of the Nobel Prize committee to choke on their surströmming).  The L2006 book chapter, published fifteen months before 29, states:

Thus, to achieve compounds with a not too high log P while still retaining potency, the difference between the log potency and the log D can be utilised.

From the A2007 perspective which was published three months before 29

Lipophilicity is thought to be a driving force for binding to anti-targets such as the hERG ion channel and cytochrome p450 enzymes and potency can be scaled by lipophilicity by subtracting measured or calculated 1-octanol water partition coefficients from pIC50.

It might be helpful to say something about efficiency metrics since LiPE (or LLE if you prefer) is an example of an efficiency metric. The idea behind efficiency metrics is to “normalize” a compound’s activity (typically quantified by potency or affinity) by the value of a risk factor such as lipophilicity or molecular size (for the masochists among you there’s an entire section in 34 on normalization of binding affinity). Ligand efficiency (LE) was introduced in 2004 (see H2004) and is generally regarded as the original efficiency metric although its creators do acknowledge the influence of the K1999 study. I’ve argued at length in 34 (Table 1 and Figure 1 in the article capture the essence of the argument) that LE is physically meaningless because perception of efficiency changes if you use a different concentration to define the standard state (by convention ΔGbinding values correspond to an arbitrary 1 M standard concentration) and there is no way to objectively select any particular value of the standard concentration for calculation of LE.  The problem doesn’t go away if you try to define ligand efficiency in terms of logarithmically expressed values of IC50, Ki or Kd instead of ΔGbinding because these quantities still have to be divided by an arbitrary concentration value in order to be expressed as logarithms (see M2011).  My view is that LE shouldn't even be described as a metric and I sometimes appropriate a quote ("it's not even wrong") that is usually attributed to Pauli because those who advocate the use of LE in drug design are unable (or unwilling) to say what it measures.

The meaninglessness of LE stems from it being defined by scaling ΔGbinding by the design risk factor (molecular size). In contrast, LipE is defined by offsetting pIC50 by the risk factor (logP) and can be interpreted (see 34) as the energetic cost of moving the ligand from octanol to its target binding site (this interpretation is only valid when the ligand binds in its neutral form and is predominantly neutral in the aqueous phase).  When considering lipophilicity in property-based design it is important to be aware that octanol is an arbitrary choice of solvent for measurement of partition coefficients and that the logP (or logD) calculated for a compound may differ significantly depending on the algorithm used for the calculations. That said, the hydrogen bond donors/acceptors and ionizable groups tend to be relatively conserved within structural series which means that the details of exactly how lipophilicity is quantified are likely to be less critical in lead optimization than for structurally-diverse sets of compounds.

When we use LipE we’re actually assuming that logP (or logD) is predictive of properties such as aqueous solubility, affinity for anti-targets and metabolic lability. That is why it’s not accurate to state that the introduction of LipE showed how “physicochemical properties impact potency, selectivity, and safety”.  In some published studies the focus is less on the LipE metric and more on what might be called the "lipophilic efficiency concept" (aim for top left corner of a plot of potency against lipophilicity). It is common to show reference lines of constant LipE to plots of potency against lipophilicity in this type of analysis and if you're doing this you really should be citing R2009 rather than 29

I'll finish the commentary on LipE (or LLE if you prefer) with this statement made in the Editorial:

Emerging from an analysis of approved drugs, this rubric predicts a compound is more likely to be clinically developable when LipE > 5. [I don’t know what the authors of the Editorial mean by “rubric” (I'm not even sure that they do) but as a reviewer of the manuscript I would have pressed them to justify their claim. Specifically I would have been looking for a literature reference (for me, the choice of the word “emerging” does rather conjure up an image of hot gases and stoned priestesses at Delphi) and a coherent explanation for why a value of 5 yields a better rubric than values of 4 or 6.]

That’s all that I’ll be saying about LipE (or LLE if you prefer) and it’s time to take a look at the contributions of the third nominee for the Nobel Prize in Physiology or Medicine:

Finally, the development of Fsp3 by Frank Lovering in 2009 introduced a new principle for how molecular shape affects pharmaceutical properties and developability. [As a reviewer of the manuscript I would have pressed the authors to explicitly state the new principle that their third nominee for the Nobel Prize for Physiology or Medicine had introduced in 2009. My view is that Fsp3 is a thoroughly unconvincing descriptor of molecular shape and I suggest readers consider the suggestion that cyclohexane (Fsp3 = 1) would have a better shape match with benzene (Fsp3 = 0) than with either methane (Fsp3 = 1) or adamantane (Fsp3 = 1).]

[04-Aug-2024 update: The Fsp3 descriptor had actually been used as i_ali in the YG2003 study (Prediction of Aqueous Solubility of Organic Compounds by Topological Descriptors) six years before the publication of 35:

The aliphatic indicator of a molecule (i_ali) is equal to the number of sp3 carbons divided by the total number of carbon atoms in the molecule.

The YG2003 study discussed prediction of aqueous solubility using i_ali (renamed as Fsp3 in 35) in conjunction with other topological descriptors. In contrast with the claims made in 35 for Fsp3 the YG2003 study made no suggestion that i_ali was a highly effective predictor of aqueous solvation when used by itself.]   

Before discussing the contributions of the third nominee for the Nobel Prize for Physiology or Medicine I should stress that I certainly consider gratuitous use of aromatic rings to be a very bad thing in drug design (it was the data analysis in 35 that was criticized in KM2013 but not the eminently sensible suggestion that drug designers should look beyond what the authors referred to as ‘Flatland’). Having sp3 carbon atoms in a scaffold provides drug designers with a wider range of options for placement of substituents than would be the case for a fully aromatic scaffold and we stated in KM2013 that:   

One limitation of aromatic rings as components of drug molecules is that some regions above and below the plane defined by the atomic nuclear positions are not directly accessible to substituents. Molecular recognition considerations suggest a focus on achieving axial substitution in saturated rings with minimal steric footprint, for example by exploiting the anomeric effect or by substituting N-acylated cyclic amines at C2. 

My view is that deleterious effects of aromatic rings on aqueous solubility would be more plausibly explained by molecular interactions stabilizing the solid state than in terms of molecular shape (this point is discussed in more detail in HBD3). I also see saturated ring systems such as bicyclo[1.1.1]pentane and cubane as potentially more resistant to metabolism than benzene. 

There’s one point that I need to make before discussing 35 from the data analysis perspective which is that molecular structures with basic nitrogen atoms tend to have higher Fsp3 values than molecular structures that lack basic nitrogen atoms (see L2013). This means that you can’t tell whether the benefits of higher Fsp3 values are actually caused by the higher Fsp3 values or by the presence of basic nitrogen atoms.

The Editorial states:

Stemming from an analysis of discovery compounds, investigational drugs, and approved drugs, Fsp3 predicts a discovery compound is more likely to become a drug when Fsp3 > 0.40. 

It’s not clear (at least to me) where the figure of 0.40 comes from and I would argue that that compound X (IC50 against therapeutic target = 50 μM; Fsp3 = 0.80) would actually be less likely to become a drug than compound Y (IC50 against therapeutic target = 10 nM; Fsp3 = 0.20). I’m assuming that what the Editorial refers to as “analysis of discovery compounds, investigational drugs, and approved drugs” is what is shown by Figure 3 in 35. Presenting data in this manner hides the variation in Fsp3 for the compounds at each stage of development and makes the trends look much stronger than they actually are (this is verboten according to current J Med Chem author guidelines). I would challenge the suggestion that what is shown in Figure 3 in 35 can be used to calculate the probability that an arbitrary compound will become a drug (my view is that it’s not feasible to even define the probability that a compound will become a drug in a meaningful manner). Analyses of success in clinical development are generally more convincing when comparisons are made between compounds that pass or fail in individual phases of clinical development than between compounds in different phases of clinical development. 

The Editorial continues:        

This observation was ascribed to increased Fsp3 leading to increased aqueous solubility, a critical physiochemical property for successful drug discovery.

I’m assuming that what the Editorial refers to as “increased Fsp3 leading to increased aqueous solubility” is the trend shown by Figure 5 of 35 (this featured prominently in the KM2013 correlation inflation article) which claims to show the relationship between Fsp3 and log S (aqueous solubility expressed as a logarithm).  This claim is not accurate because the log S values have been binned and the relationship is actually between centre point of bin and mean log S value for bin. The authors of 35 used public domain aqueous solubility data for their analysis and we showed (KM2013; see Figure 5) that the Pearson correlation coefficient for the relationship between log S and Fsp3 is only 0.25 (the corresponding value for the binned data is 0.97).  I consider the suggestion that such a weak correlation could have any relevance whatsoever to the the likelihood of success in clinical trials to be wild and uninformed conjecture.      

I'll finish my commentary on Fsp3 by reproducing this claim made in the Editorial:

Much like the Rof5 and LipE, Fsp3 has proven to be enduringly useful for the design of compounds with improved chances of clinical success. (37) [My view is there is insufficient evidence to justify this claim and I'm perplexed by the citation of 37. In any case, members of the Nobel committee are likely to focus more on whether or not Fsp3 is usefully predictive than on the endurance of this molecular descriptor.]  

It’s now time to summarise what has been a long and at times pedantic blog post, and I thank all readers who’ve stayed with me. I don’t consider any of the three studies (22 | 29 | 35) that form the basis of the Nobel Prize nomination to have reported significant scientific discoveries and I would also challenge the claim made in the Editorial that these studies introduced new principles. I’m aware that 22 is heavily cited and I certainly agree that it is common to see values of LipE and Fsp3 quoted in the drug discovery literature. Nevertheless, I would argue that that the Editorial failed to provide even a single convincing example of the Rof5, LipE or Fsp3 making a critical contribution to the discovery of a marketed drug (this should be quite sufficient to rule out the award of a share in the Nobel Prize for Physiology or Medicine to any of these nominees). Furthermore, the Editorial doesn’t provide any convincing evidence that the Rof5, LipE or Fsp3 are usefully predictive in drug discovery projects.

Aside from the failure of the Editorial to demonstrate significant impact for the Rof5, LipE and Fsp3, I do have some scientific concerns about this Nobel Prize nomination. First, the Rof5 is not actually supported by data. Second, LipE had already been discussed, although not named, in the drug discovery literature when 29 was published. Third, Fsp3 had been used previously (as i_ali) for aqueous solubility prediction and the data analysis in 35 would fail to comply with current J Med Chem author guidelines.