Molecular Design: The OpenBind initiative

I’ll open the post on the OpenBind initiative with photos from my visit last year to Korea which was timed to coincide with the cherry blossoms (this meant that the customary April Fools post was from Seoul). Things did not start well on the day that I took these photos (having lined up the first shot for the day it became abundantly clear that the camera’s battery was still being charged at the hotel) and I wondered whether Great Leader’s grandson might have labelled me as a dotard. Fortunately, Seoul’s Metro is excellent and I was still able to get some photos at Huiujeong-ro Cherry Blossom Road and Yangjaecheon Stream.

In this post I’ll be taking a look at the OpenBind initiative and here's a summary of the concept. I certainly see great value in having large quantities of this type of data (affinity measurements with X-ray crystal structures for the corresponding protein-ligand complexes) to the drug discovery and chemical biology communities. The grating-coupled interferometry (GCI) protocol used for affinity measurement enables association and dissociation to be observed in real time and presumably it is also possible to characterize stoichiometry using this technique. I would expect he GCI protocol to enable weaker binding affinities to be reliably quantified (likely to increase the dynamic range of the assay) as well as allowing measurement of binding affinity of glycoproteins for ligands. Given the focus on enabling affinity prediction, there is no reason for excluding anti-targets or non-human proteins.

Generation of data for training machine learning (ML) models, which are renowned for their voracious appetite for data, appears to be the principal aim of the initiative. However, the availability of large quantities of such data will also enable more extensive evaluation of physics-based methods for calculating binding affinity and can potentially inform hypothesis-driven design by identifying bioisosteric relationships between elements of substructure. One point worth making is that having affinity measurements linked to protein-ligand structures for structurally-related compounds of varying molecular complexity (see HLH2001) enables frustration of molecular interactions to be studied (this is particularly relevant to fragment-based design) and I discussed in HBD3 how frustration of hydration might be exploited in design. Given the importance of aqueous solvation in biomolecular recognition it may be beneficial to measure some alkane/water partition coefficient values and I'll point you to a post on this topic in case it's of interest. As discussed in KMP2013 and B2017 polarity parameters can be derived from alkane/water partition coefficient measurements for functional groups.

I've suggested that there are three objectives to drug design and the OpenBind initiative addresses the first of these which is to maximize on-target bioactivity. It's worth noting that proteins are not the only drug target class of interest (see CD2022) while bioactivity for ‘new modalities’ such as targeted protein degradation (see CC2026) and irreversible covalent bond formation between targets and ligands cannot be quantified in terms of affinity alone. My view is that OpenBind would be more accurately described as an initiative for ligand discovery than for drug discovery given its focus on enabling methods for affinity prediction. Modern ML models for affinity prediction are effectively quantitative structure-activity relationship (QSAR) models and I would question whether the use of the AI label is justified in either case. All that said, I would expect OpenBind to catalyse significant progress in the affinity prediction field which hopefully will translate to tangible benefits for drug discovery.

It’s perhaps appropriate to take a general look at QSAR approaches given that the main focus of OpenBind appears to be generation of data for training what could be referred to as 'QSAR-like' ML models. In my view, QSAR modelling never made much of a splash in real world drug discovery and claims that particular models have made significant impact on drug discovery projects are generally not verifiable. A difficulty faced by QSAR practitioners was that projects had delivered or been put out of their misery by the time there was sufficient data for building predictively useful models. Medicinal chemists typically perform their optimizations within specific structural series and this means that structure-activity relationships (SARs) tend to be local in nature (I’m not aware of any studies in which a QSAR model built using only data from one structural series was convincingly shown to be usefully predictive of bioactivity for compounds in a different structural series). For users of ML bioactivity models it is important to know whether chemical structures for which predictions are being made lie within the applicability domains of the models. Put another way, medicinal chemists who use ML models are generally more interested in how well the models predict for the structural series that they're working on and less interested in how well the models have fit the training data (anybody who has received financial advice will be familiar with the "past performance is not indicative of future results" disclaimer). The selection criteria for inclusion of targets and ligands by the OpenBind initiative are not currently clear and I'm guessing that large scale structural determination might prove challenging for membrane proteins.

The availability of affinity measurements that are linked to X-ray crystal structures for the corresponding protein-ligand complexes enables affinity to be modelled in terms of the molecular interactions between proteins and their ligands. This is the approach used to create the scoring functions used in virtual screening and it provides a means to address the local nature of SARs. While this might seem to be an obvious way to model affinity data it's important to be aware that the contribution to affinity of an individual contact, such as a hydrogen bond, between the protein and ligand is not an experimental observable (see NoLE). Put another way, there is no unique way of decomposing a value of ΔG° (standard Gibbs free energy of binding) into a sum of terms based on individual noncovalent contacts between the protein and ligand. One reason reason for this is that association of proteins with their ligands occurs in aqueous media and this point has been clearly articulated in the S2012 study:

Molecular binding in an aqueous solvent can be usefully viewed not as an association reaction, in which only new intermolecular interactions are introduced between receptor and ligand, but rather as an exchange reaction in which some receptor–solvent and ligand–solvent interactions present in the unbound state are lost to accommodate the gain of receptor–ligand interactions in the bound complex.

However, there’s another reason why there’s no unique way to decompose binding free energy into a sum of terms based on individual noncovalent contacts and here’s a well-known equation written a bit differently to how you normally see it written:

This shows that the value of ΔG° varies with the concentration, C°, that defines the standard state. By convention C° is set to 1 M although this is arbitrary and has no physical basis (see G1997) and this means that the binding free energy values encountered by drug discovery scientists are always negative (consider the feasibility of measuring a K_d value of greater than 1 M). Writing ΔG° as a sum of terms based on individual non-covalent contacts is challenging because each term needs to depend on C° while the sum of terms needs to reproduce the dependence of ΔG° on C°. This is discussed in NoLE and the problems can be seen more easily if you think about how you might write K_d as a product of terms based on individual non-covalent contacts. The dependence of ΔG° on C° has implications for interpretability of ML models for binding affinity.

My understanding is that scoring functions (see GPD2018 | WBS2017 | A2015 | C2012 | S2012 | F2004 | SR2001 | GHK2000 | MM1999 | E1997 | MSK1992) used in virtual screening are generally not predictive of affinity to the extent that they can be routinely used in lead optimization. Perhaps it will be different for Boltz-2 (described in the P2025 preprint) although questions have been raised in BSR2026 as to whether Boltz-2 "truly relies on the physics of intermolecular interactions" and the term “absolute FEP” does ring some alarm bells for me. Various explanations have been offered for the typically underwhelming performance of scoring functions for affinity prediction including the usual suspects (protein flexibility, solvation and entropy). However, a much simpler explanation might be that scoring functions are trained to predict the difference in free energy between two states by only using the structure corresponding to one of the states.

I remain sceptical that it will prove feasible to build genuinely universal models for prediction of binding affinity from structures of protein-ligand complexes although I'll be very happy if my scepticism is shown to be unfounded. Describing energetics of target-ligand interactions in a general manner to enable ML modelling of affinity will be challenging because of the necessity to encode factors such as interaction potential, geometric dependence and solvent exposure (bear in mind that physics-based methods for prediction of affinity are already available and I'll direct readers to the Open Free Energy and open forcefield initiatives). While modelling affinity in terms of molecular interactions circumvents the need for training data to sample every conceivable combination of structural series with target, the need to meaningfully define applicability domains does not disappear. My view is that when affinity datasets for different targets are combined for ML modelling, data should be split at the target level for cross-validation. This would entail splitting data so that each test set consists of only (and all) the data for a single target. I have argued in a previous post that the complexity (for example, number of parameters used to fit the training data) of models should be properly accounted for when comparing performance for ML models.

Datasets generated by OpenBind are likely to also prove valuable for testing and development of physics-based approaches to affinity prediction such as use of simulation to calculate ‘absolute’ (ΔG°) and ‘relative’ (ΔΔG) free energy of binding. Physics-based free energy calculations are typically more computationally demanding for ΔG° than for ΔΔG (a view expressed in B2009 is that it's generally easier to predict differences in property values for pairs of structurally-related compounds than it is to predict property values from chemical structures of compounds). Methods for calculating ΔΔG (here’s a helpful review) are especially relevant to drug design because medicinal chemists typically work within structural series, defining SARs in terms of ratios of affinity (or potency) for pairs of structurally-related compounds. Put another way, ΔΔG calculations enable project team scientists to exploit existing project data to predict affinity for potential synthetic candidates and I would argue that ML modellers really do need to be thinking more about prediction of differences in affinity (and other pharmaceutically relevant properties) between structurally related compounds. As an aside, free energy perturbation (FEP) was a major source of inspiration when I started to use the Leatherface (don't ask 😁😁😁) chemical structure editing software to do matched molecular pair analysis (MMPA) in the late 1990s, even though physics-based ΔΔG calculations were still largely seen as academic curiosities at that time.

While I’m certainly enthusiastic about physics-based methods such as FEP for calculating ΔΔG it’s not clear how generally these can handle significant modifications to the core of a structure (this is the scaffold-hopping scenario) and I would anticipate difficulties when the main effect of the structural perturbation is to alter conformational preference (as is the case for N-methylation of the secondary amide that is conserved in a number of SARS-CoV-2 main protease inhibitors). That said, the data generation capability of the OpenBind initiative should enable perceived weaknesses in FEP methodology to be addressed. I'll highlight a couple of general ways to use the data sets that OpenBind will generate might be used to validate methods for predicting relative affinity. First, you can use the relative affinity values that correspond to specific structural transformations such as chloro substitution (a good way to study activity cliffs and focusing on specific structural transformations counters criticism that predictive models are just capturing lipophilicity or molecular size), chloro to bromo (a good way to see if you're modelling halogen bonding effectively), and aromatic nitrogen to CH (in design it is useful to determine where polarity can be introduced with minimal loss of affinity). Second, you can use relative affinity measurements to assess how well models predict non-additivity in SARs (non-additivity can be also be considered in th activity cliff framework). I should point out that neither of these suggestions is novel (see L2012 and C2016) and activity to ML modellers are already looking at activity cliffs (see vT2022).

This is a good point at which to wrap up and I'll be taking a look at the OpenADMET initiative in the next post.

Molecular Design

Monday, 22 June 2026

The OpenBind initiative

No comments: