Wednesday, 27 May 2026

Grand Challenges for Predictive Modeling in Small Molecule Drug Discovery

In this blog post I’ll be taking a look at C2026 (Grand Challenges for Predictive Modeling in Small Molecule Drug Discovery) which has been published as a ChemXriv preprint. A well-organized collection of grand challenges can indeed help focus scientific research effort on the most important challenges and I consider C2026 to be welcome relief from the view that we can solve all problems with AI/ML. The authors put it well with their statement:

While there is substantial enthusiasm (particularly around AI) for revolutionizing drug discovery, this moment demands sharper problem definition.

In my view, however, C2026 could have been be better organized (for example, I would question why covalent binding is in DOMAIN: CHEMISTRY while pKa is in DOMAIN: PHARMACOLOGY). Nevertheless, the article is still at the preprint stage and my feedback will hopefully be helpful for the authors.  

I’ll direct readers to a recent blog post (The objectives of drug design) in which I suggest that it can be helpful to see design of drugs in terms of on-target bioactivity (good things that drugs do to the human body), off-target bioactivity (bad things that drugs do the human body) and exposure (things that the human body does to drugs). Uncertainty pervades drug discovery and even if we knew the exact extent to which a targets were engaged in vivo we still wouldn’t know what effects drugs will have on patients in the absence of other information (this is the uncertainty that results from the complexity of biology). One significant source of uncertainty is that we generally can’t currently measure the concentration of a drug at its site(s) of action and I recommend that everybody working in Drug Discovery (and Chemical Biology) take a look at SR2019 (Smith & Rowland, Intracellular and Intraorgan Concentrations of Small Molecule Drugs: Theory, Uncertainties in Infectious Diseases and Oncology, and Promise DMD 2019 47:667-672). 

Some years ago I suggested that drug design could be classified as prediction-driven or hypothesis-driven and I’ll direct readers to an the P2012 article on hypothesis-driven drug design by former colleagues. Back in 2009 I stated that “in many situations, properties of compounds simply cannot be predicted with the accuracy required for meaningful design, especially when optimization is performed against multiple end points” and, despite some impressive advances in predictive chemistry since then, this is still my view. Put another way drug discovery needs to be considered in a Design of Experiments framework and I consider it an error to perceive it as simply an exercise in prediction.

The value of a prediction made using chemical structure as the only input drops sharply once a sample of the compound has been prepared and decisions as to whether further work on an existing compound is justified will invariably be based on measured data. For example, the PK/PD modelling used to set the dose will typically be based on measured bioactivity (often cell-based) and pharmacokinetics. Aside from speed the great advantage of calculating ‘relative’ (see CAS2017), as opposed to ‘absolute’ free energy is that it enables project team scientists to use existing affinity and potency measurements for design. That said, the purpose of grand challenges like these is to articulate what we need to be able to predict rather than get distracted by feasibility issues.

With the preamble out of the way I’ll focus on the grand challenges and for the remainder of the post my comments will follow the order of the manuscript. As noted in my review of A2025, 'molecule' should not be used as a synonym for either 'compound' or 'chemical structure'. 

DOMAIN: CHEMISTRY

I suggest covering Covalent Binding in DOMAIN: STRUCTURE and DOMAIN: ENERGY and would include reactivity in Challenge: Chemical Stability and Degradation Products (a quinone might be perfectly stable but it’s not something that you would want to have in a enzyme inhibition assay). My view is that physicochemical properties such as pKa, aqueous solubility, aggregation and passive permeability would be more appropriately covered in DOMAIN: CHEMISTRY than in DOMAIN: PHARMACOLOGY and I would also include alkane/water partition coefficient (this is more appropriate than its octanol/water equivalent as for studying aqueous solvation and is also a better model for the core of a lipid bilayer). It might also be worth including UV-Vis absorption and fluorescence here given that both phenomena are widely exploited to assay bioactivity of compounds.

DOMAIN: STRUCTURE

Given significant interest in ‘new modalities’ I suggest referring to ‘targets’ rather than ‘proteins’ and it might be worth considering ternary structures (important in targeted protein degradation). Structures for target-ligand complexes are not directly relevant to design when association is irreversible although they are still useful starting points for building transition state models.

DOMAIN: ENERGY

Many of the quantities that form the basis of drug design fit naturally into DOMAIN: ENERGY given that they are effectively equilibrium constants or rate constants. Given significant interest in ‘new modalities’ I suggest referring to ‘targets’ rather than ‘proteins’. For irreversibly-bound ligands it's also necessary to calculate the transition state energy because target engagement occurs under kinetic control. My view is that  oral absorption and drug distribution as well as modelling of enzymatic reactions (for example, oxidative metabolism by CYPs) and active transport would be easily accommodated within DOMAIN: ENERGY.  One challenge that should be explicitly stated is prediction of plasma drug concentration profiles in humans because it is needed for meaningful PK/PD modelling.

DOMAIN: PHARMACOLOGY

A number of the challenges in DOMAIN: PHARMACOLOGY are not actually related to pharmacology and challenges such as Toxicity and PK/PD modelling could be accommodated within DOMAIN: ENERGY.

Wednesday, 20 May 2026

The objectives of drug design

I'll open the post on drug design objectives with photos from a most enjoyable and informative visit to the Australian Synchrotron early in 2010 when I was helping with fragment library design at CSIRO.



I’ve been meaning for ages to do a post like this and was finally goaded into action when I recently looked at two short videos from interviews with Sir Demis Hassabis, founder of Google DeepMind and Isomorphic Labs, and one of the 2024 Nobel Chemistry Prize laureates. Predicting the 3D structure of a protein from its amino acid sequence is a capability that has been eagerly sought for a long time and, as we celebrate the award, we need to also recognize the remarkable foresight of those who launched the Protein Data Bank in 1971 with just seven X-ray crystal structures. We also need to recognize that protein structures are inherently flexible and subject to post translational modification such as glycosylation and phosphorylation. Furthermore, the crystal structure that has actually been determined might correspond to a relatively small portion (for example, a tyrosine kinase domain) of a much larger structure such as a dimeric growth factor receptor.

Let’s take a look at the two videos. In the first video, Sir Demis suggests that the end of disease is “within reach maybe in the next decade or so” and it’s worth pointing out that most of the cost of bringing a drug to market comes from clinical development rather than the actual discovery of the drug (nobody spends “ten years and billions of dollars to design just one drug” and it would be more accurate to say that we do so to see if what we've designed really is a drug). Furthermore, work in the late stage of drug discovery when project teams are assessing their best compounds should not really be regarded as drug design. In the second video, Sir Demis acknowledges that “knowing the structure of a protein is only one step in the drug discovery process” although it’s not clear exactly how “many adjacent AlphaFolds” are going to meaningfully address the issues of side effects.

Drug design is frequently asserted to be a multi-objective exercise and, in this post, I’ll be trying to discuss this in a way that I hope will be helpful to drug discovery scientists using artificial intelligence (AI) and machine learning (ML) in design. The ultimate aim of drug design is to identify compounds (and biological entities such as therapeutic antibodies) that can be used to treat diseases without harming patients and I suggest that this can be stated as three design objectives. My view is that the term 'multi-objective' is more appropriate than 'multi-parameter' in the context of drug design because even against a single objective design can involve optimization of multiple parameters. One characteristic of drug design is that the design process is over long before we get to find out how successfully the outputs of design perform their function (in design of materials it's possible to evaluate design outputs more directly). I recall a Head of Research and Development at Zeneca describing the process as "like steering an oil tanker".

I prefer to use the more general term ‘bioactivity’ to describe the effects of drugs on targets (and anti-targets) because in some cases these effects cannot be meaningfully described by a single parameter such as an IC50 value. As an aside this is a good point at which to celebrate the recent FDA approval of the PROTAC Vepdegestrant for treatment of ESR1m, ER+/HER2- advanced breast cancer and I'll direct readers to this most excellent and timely review on targeted protein degradation. The concentration of a drug in contact with a target (or anti-target), which varies with time, is determined by dose, and by the drug’s absorption, distribution, metabolism, and excretion (commonly referred to as ADME).  While the therapeutic and adverse effects of drugs are what the drug does to the body ADME is what the body does to the drug. Put another way, minimization of toxicity and optimizing ADME are entirely different objectives and I generally recommend that the acronym ADMET not be used.

Uncertainty is omnipresent in drug discovery and, despite what many appear to believe, AI/ML is not going to make this uncertainty vanish as if by magic. Derek was emphasizing the challenges presented by the complexity of biology long before AI came to be seen by some as a panacea for the ills of Pharma/Biotech (here’s a post from almost two decades ago and I also recommend reading his 2025 post on the “End of Disease” interview which also links relevant previous posts). The complexity of biology means that even if we knew the extent of target engagement in vivo (which varies with both dose and time) we wouldn’t generally be able to predict the in vivo effects of the drug with any confidence in the absence of other information. There is also uncertainty in exposure to consider and the concentration of a drug at its site(s) of action generally cannot be measured in vivo unless the target(s) are in direct contact with plasma. Uncertainty in exposure for intracellular targets is also a clinical development issue because failure in a Phase II trial may simply reflect inadequate exposure (we noted in KM2013 that “one can argue that a typical Phase I trial provides an incomplete description of distribution”). I recommend that everybody working in drug discovery and chemical biology read Smith & Rowland (2019) Intracellular and Intraorgan Concentrations of Small Molecule Drugs: Theory, Uncertainties in Infectious Diseases and Oncology, and Promise DMD 47:667-672 DOI. I argue in NoLE that achieving controllability of exposure should be seen as an objective of drug design.

One way that pharmacokinetic/pharmacodynamic (PK/PD) modellers address the issue of intracellular exposure is to assume that the concentration of drug in contact with its target(s) (and anti-targets) equals its unbound concentration in plasma (which can be measured in real time) and this assumption is referred to as the ‘free drug hypothesis’ (‘principle’ and ‘theory’ are also used in this context although I personally prefer ‘hypothesis’ because it’s an assumption we’re making). There are two scenarios under which the approximation of the concentration of drug at its site(s) of action by its unbound concentration in plasma is known to be unreliable. The first scenario is that there is significant active transport at one or more points on the path between plasma and the drug’s site(s) of action (active efflux is a common problem, especially in CNS drug discovery, although active influx will still cause the assumption to break down). The second scenario is that the pH at the drug’s site(s) of action differs from plasma pH (as would be the case for a lysosomal target) and that there is an ionizable group such as a basic nitrogen in the chemical structure of the drug.

While drug design does indeed have multiple objectives it really shouldn’t need to be said that if the required level of bioactivity cannot be achieved then it becomes irrelevant whether the other objectives are achieved and I’ll direct readers to M2026 (The Affinity Advantage). I see M2026 as providing a much-needed cold shower for a 2024 JMC Editorial (Property-Based Drug Design Merits a Nobel Prize; see blog post) in which it is asserted that “a discovery compound is more likely to become a drug when Fsp3 > 0.40” and that “a compound is more likely to have good developability when PFI < 7”. Nevertheless, I don’t consider M2026 to be especially useful from the perspective of defining drug design objectives because bioactivity is typically quantified by potency rather than affinity in drug discovery projects (an assay for kinase inhibition might have been run at high ATP concentration to mimic the intracellular environment) and some bioactivity objectives are defined in terms of measurements made in cell-based assays. Furthermore, bioactivity for ‘new modalities’ such as irreversible covalent inhibition and targeted protein degradation cannot be adequately described by a single parameter such as an IC50 value.

I criticized the term ‘avoid-ome’ in a previous post and, with apologies for the dreadful pun, I would recommend that its use be avoided (at the risk of repetition ADME and toxicity are entirely separate issues that must be addressed separately). Furthermore, I would question whether drug designers actually need yet another ‘ome’ word and I consider the notion that embracing the avoid-ome will transform drug discovery to be fanciful. While inhibition of cytochrome P450 (CYP) enzymes is generally undesirable from a toxicity perspective a compound that was not cleared by these metabolic enzymes would greatly worry those responsible for drug safety (bear in mind why we worry about inhibition of CYPs in the first place). Furthermore, I would challenge the inclusion by M2026 of serum albumin in a list of anti-targets such as hERG (I’m not aware of anybody suffering cardiac arrest on account of their medication binding to serum albumin) and the excellent B2025 study notes that "most drugs are >95% plasma protein bound (58%), with a large fraction >99% bound (29%)". Binding to plasma proteins should actually be considered within the framework of distribution (it can be instructive to pose the question as to whether you could tell where a drug was simply from knowing the total quantity of it in the body and its unbound plasma concentration). It’s also worth mentioning that binding to plasma proteins will protect an orally-dosed drug from the metabolizing enzymes during its first pass through the liver (before it gets a chance to distribute into the tissues). Variation of the plasma concentration during the dosing interval for an orally-dosed drug is a necessary evil resulting from oral dosing and in many situations the ‘ideal’ pharmacokinetic profile would actually be that resulting from intravenous infusion (plasma concentration of the drug is maintained at a level required for therapeutically useful effects).

At this point I’ll attempt to articulate three general objectives of drug design (the only thing that I’m entirely confident about is here that I won’t get these exactly right). One of the great challenges that drug designers face is that it is usually difficult to identify compounds that simultaneously achieve all the design objectives. Specifying criteria for objectives too permissively increases the risk of choking in clinical development.  However, overly stringent specification of criteria for objectives decreases the likelihood of achieving all of the objectives and will slow the discovery process. I state these objectives in terms of ‘bioactivity’ rather than ‘potency’ to accommodate ‘new’ modalities such as irreversible covalent inhibition and targeted protein degradation although, in many cases, it will be possible to quantify the bioactivity for a compound by a single IC50 or EC50 value. I use ‘maximize’ and ‘minimize’ (as opposed to ‘optimize’) to frame the objectives because there is generally no penalty for identifying better compounds than you think you need. Assessing how well objectives have been achieved involves running a diverse range of assays and, as noted in this blog post on the A2025 study, it is important to be fully aware of the quantitation limits for each and every assay that you use.

I'll conclude the post with what I would argue are the three objectives of drug design:

  1. Maximize on-target bioactivity.  This is the least difficult objective to specify because bioactivity characterized in the in vitro assays is likely to translate to target engagement in vivo provided that the compound can be presented to the target(s) at the required concentration. Design outputs are usually evaluated in animal models for the human disease before initiating studies in humans but the design itself is almost invariably done against in vitro end points. 
  2. Minimize off-target bioactivity. It is generally more difficult to specify objectives for off-target bioactivity than for on-target bioactivity on account of the numbers and diversity of the assays involved. Design outputs are always evaluated for toxicity in animals before initiating studies in humans (as mandated by regulatory authorities) but the design itself is almost invariably done against in vitro end points.    
  3. Maximize controllability of exposure. This objective, which might also be stated as 'Optimize ADME', is the most difficult of the three objectives to specify because, as noted earlier in this post, exposure generally can’t be measured for targets that are not in direct contact with plasma. At absolute minimum it is necessary to demonstrate that a pharmacokinetic profile can be achieved in animals that will maintain the (unbound) concentration of the compound at levels that we believe will result in beneficial therapeutic effects in humans. For targets not in contact with plasma the PK/PD modellers also need to be able to confidently invoke the free drug hypothesis (this is why I prefer to frame the objective in terms of exposure rather than ADME) and this requires that design outputs have good passive permeability and are not subject to active transport. In some cases it will also be necessary to demonstrate access to specific organs such as the CNS.