Tuesday, 1 April 2025

Property Forecast Index Validated

I arrived in Korea on Friday night and am greatly enjoying it here. Photos below show the Jungbu Dried Seafoods Market near where I'm staying and dinner on Sunday (spicy beef noodles).



I visited the War Memorial on Sunday and took selfies with the Shenyang J-6 (Chinese version of MiG-19) 'liberated' by Capt. Lee Woong-pyeong when he defected to South Korea on 25th February 1983, a 'liberated' T-34 (as Uncle Joe is said to have observed, quantity has a quality all of its own) and Great Leader's car (also 'liberated' although it was not clear exactly when). 

So enough of the travel photos for now and let's get back to the science. Regular readers (both of them) of this blog will be well aware of my visceral dislike for drug design metrics. One reason for this visceral dislike is that I consider these metrics to trivialise the problems faced by medicinal chemists and I remain sceptical that one can make meaningful predictions of developability or likelihood of clinical success for compounds based only on their chemical structures without knowing anything about their biological activities. One metric that I have criticised harshly in the past is property forecast index (PFI) which was originally introduced as solubility forecast index (SFI). Specifically, I denounced SFI as a ‘draw pictures and wave arms’ data analysis strategy and privately I even considered the possibility that it had been created by a toddler armed with a box of colored crayons.

Let’s take a look at the HY2010 article in which SFI was introduced. Proprietary aqueous solubility measurements (continuous variable) were first processed to assign compounds to one of three aqueous solubility categories. Histograms showing the proportions of measurements in each aqueous solubility category were created by binning values of SFI and of c log DpH7.4 and the histograms were compared visually:    

This graded bar graph (Figure 9) can be compared with that shown in Figure 6b to show an increase in resolution when considering binned SFI versus binned c log DpH7.4 alone.

Recently, I have been forced to revise my negative view of PFI and I have to admit that it pains me deeply to realise that I could have been so utterly wrong for so long in my assessment of what is actually an elegant and highly-predictive drug design metric. Indeed I have now come to the conclusion that the only reason that the Journal of Medicinal Chemistry did not include PFI in its nomination for the Nobel Prize in Physiology or Medicine was that the introduction of the Ro5, LipE and Fsp3 principles led directly to so many marketed drugs being approved.

What has caused such a fundamental shift in my views? First, PFI is highlighted in the European Federation of Medicinal Chemistry (EFMC) ‘Best Practices from Hits to Lead Generation’ webinar.  Now it goes without saying that EFMC includes some of the sharpest minds in medicinal chemistry and, given that they consider PFI to be sufficiently important for inclusion in a best practices webinar, it became abundantly clear that I needed to revise my hopelessly naïve thinking. Let’s join the webinar at 27:53 and you’ll see in the webinar slide that SFI (as PFI was originally introduced) has been strongly endorsed by Practical Cheminformatics, a blog that many, including me, accept without question as the source of a number of fundamental ground truths in the AI field.

However, what convinced me of the sublime elegance and extreme predictivity of PFI is a seminal study by the world-renowned expert on tetrodotoxin pharmacology, Prof. Angelique Bouchard-Duvalier of the Port-au-Prince Institute of Biogerontology, working in collaboration with the Budapest Enthalpomics Group (BEG). The manuscript has not yet been made publicly available although I was able to access it with the help of my associate ‘Anastasia Nikolaeva’ (not sure exactly what she’s doing these days although she did post a photo from Pyongyang showing her and a burly chap with a toothy grin and a bizarre haircut). There is no doubt that this genuinely disruptive study will comprehensively reshape the predictive ADME landscape, enabling drug discovery scientists, for the very first time, to make accurate predictions for developability and probability of critical trial success using only chemical structures as input.

Prof. Bouchard-Duvalier’s seminal study clearly demonstrates that graphical presentation of categorized continuous data outperforms regression analysis performed on the uncategorized continuous data. The math is truly formidable (my rudimentary understanding of Haitian patois didn’t help either) and involves first projecting the atomic isothermal compressibility matrix into the quadrupole-normalized polarizability tensor before applying the Barone-Samedi transformation, followed by hepatic eigenvalue extraction using an algorithm devised by E. V. Tooms (a reclusive Baltimore resident whose illustrious research career in analytic topology was abruptly halted almost 31 years ago by an unfortunate escalator accident). The incisive analysis of Prof. Bouchard-Duvalier shows without a shadow of doubt that the data visualization used to establish PFI as a fundamental drug design principle will reliably and robustly outperform all AI approaches to prediction of aqueous solubility. Furthermore, ‘Anastasia Nikolaeva’ was also able to ‘liberate’ a prepared press release in which the beaming BEG director Prof. Kígyó Olaj explains that, “Possibilities are limitless now that we can accurately and robustly predict the developability of a compound using only its chemical structure as input and we can now finally consign regression analysis to the dustbin of history. Surely the Editors of Journal of Medicinal Chemistry will recognize the impact of PFI on real world drug discovery when they make their Nobel Prize nominations later this year.” 

Sunday, 9 March 2025

Thinking About Aqueous Solvation

Given that it was International Women's Day yesterday, I'll open the the post (and blogging for 2025) with a photo of a gravestone at St James' Church in Bramley (Hampshire).

In the current post I’ll be taking a look at some aspects of aqueous solvation and Richard Wolfenden’s 1983 “Waterlogged Molecules” article (W1983) is still worth reading today (as an aside, Prof Wolfenden will turn ninety in May of this year and hopefully mentioning this won't put what is called "goat mouth" in my native Trinidad and Tobago on him as I did for Oscar Niemeyer with the words "ele vive ainda" while studying Portuguese in 2012). As noted in W1983 the formation of a target-ligand complex requires partial desolvation of both target and ligand:

When biological compounds combine, react with each other, or change shape in watery surroundings, solvent molecules tend to be reorganized in the neighborhood of the interacting groups.

Formation of a target-ligand can also be seen as an “exchange reaction” and this point is very well made in SGT2012:

Molecular binding in an aqueous solvent can be usefully viewed not as an association reaction, in which only new intermolecular interactions are introduced between receptor and ligand, but rather as an exchange reaction in which some receptor–solvent and ligand–solvent interactions present in the unbound state are lost to accommodate the gain of receptor–ligand interactions in the bound complex.

In HBD3 I briefly discuss ‘frustrated hydration’ as a phenomenon that could be exploited in drug design and I’ll quote from the Summary section of W1983:  

When two or more functional groups are present within the same solute molecule, their combined effects on its free energy of solvation are commonly additive. Striking departures from additivity, observed in certain cases, indicate the existence of special interactions between different parts of a solute molecule and the water that surrounds it.

I’ll try to explain how this could work for ligand design and let’s suppose that we have two polar atoms that are close together in the binding site. The proximity of the polar atoms in the binding site means that water molecules forming ideal interactions with the polar atoms in the binding sites are also likely to be close together. However, the mutual proximity of the water molecules can lead to unfavourable interactions between the water molecules which ‘frustrate’ the (simultaneous) hydration of the two polar atoms in the binding site. Now if we design a ligand with two polar atoms positioned to form good interactions with polar atoms in the binding site it is likely that these will also be in close proximity and that their hydration will be similarly frustrated. I would generally anticipate that frustration of hydration will not be handled well by implicit solvent models (RT1999 | FB2004 | CBK2008 KF2014)  or computational tools such as WaterMap that calculate energetics for individual water molecules (especially in cases where the two hydration sites cannot be simultaneously occupied).

To illustrate frustration of hydration I’ve taken a graphic from a talk from 2023. The unfavorable interactions between solvating water molecules that frustrate hydration are shown as red double-headed water molecules (in some cases these interactions will be repulsive to the extent that only one of the hydration sites can be occupied at a time). You’ll also notice two thick green lines in the right hand panel and these show secondary interactions that stabilize the bound complex. Secondary interactions of this nature were discussed in a molecular recognition context in the JP1990 study and the observation (see A1989) that pyridazine is a better hydrogen bond acceptor (HBA) than its pKa would have you believe can be seen in a similar light.  Secondary interactions like these only enhance affinity when the proximal polar atoms are of the same ‘type’ (the proximal polar atoms in the 1,8-naphthyridine are both HBAs) and we should anticipate that the secondary interactions for the contact between pyrazole and the ‘hinge’ of a tyrosine kinase will be deleterious for affinity. In contrast to secondary interactions, frustration of hydration can be beneficial for affinity even when the proximal polar atoms are of opposite types, as would be the case for an HBA that is near to a hydrogen bond donor (HBD).     

While it is clearly important to account for aqueous solvation when using physics-based approaches for prediction of binding affinity, passive permeability and aqueous solubility, the measurement of gas-to-water transfer free energy is not exactly routine (I’m not aware that any companies offer measurement aqueous solvation energy as a service nor do I believe that this is an activity that would readily funded). Measurements for aqueous solvation energy reported in the literature tend to be for relatively volatile compounds and I’ll direct readers to the C1981, W1981 and A1990 studies.

A view is that I've held for many years is that a partition coefficient could be used as an alternative to gas-to-water transfer free energy for studying aqueous solvation. It's also worth noting that when we think about desolvation in drug design we're often considering the energetic cost of bringing polar atoms into contact with non-polar atoms (as opposed to transferring the polar atoms to gas phase). Partition coefficient measurement is a lot more routine than solvation free energy measurement and most drug discovery scientists are of aware that the octanol/water partition coefficient (usually quoted as its base 10 logarithm logP) is an important design parameter. However, the octanol/water partition coefficient is not useful for assessing aqueous solvation because the hydroxyl group of octanol can form hydrogen bonds with solutes and the water-saturated solvent is actually quite 'wet' (the DC1992 study reports that the room temperature solubility of water in octanol is 2.5 M). If we’re going to use partition coefficient measurements for studying aqueous solvation then I would argue that we should make these measurements with a saturated hydrocarbon such as cyclohexane or hexadecane that lacks hydrogen bonding capability.

Here’s another slide from that 2023 talk showing that pyridine is lipophilic for octanol/water but hydrophilic for hexadecane/water. The difference in the logP values for a solute is sometimes referred to as ΔlogP (it is equivalent to the hexadecane/water logP value with both solvents water-saturated) and can be considered to quantify the solute’s ability to form hydrogen bonds (see Y1988 | A1994 | T2008). I'll mention in passing that ΔlogP measurements with toluene as the less polar organic solvent have been used to study intramolecular hydrogen bonding (see S2013 | C2016 | C2018).     


It should be stressed  that people have been thinking about using different organic solvents for partition coefficient measurement for a lot longer than me. My view, expressed in K2013, is that the justification in H1963 for using octanol was partly based on a misinterpretation of Collander's C1951 study. I really like this quote from Alan Finkelstein's 1976 article (as an aside the partition coefficient literature is not exactly awash with alkane/water logP measurements for amides and the article reports measured values of the hexadecane/water partition coefficient for acetamide, formamide, urea, butyramide and isobutyramide): 

It has long been fashionable to worry about which organic solvent (and polarity) is the best model for the lipoidal region of a particular cell membrane (Collander, 1954). These solvents have ranged from isobutanol (the most polar) to olive oil (the least polar). I have never understood the point of this. If the lipoidal region of the plasma membrane is a lipid bilayer, then clearly the appropriate model solvent is hydrocarbon. For artificial bilayers this is obviously so. I chose n-hexadecane as the particular hydrocarbon, because its chain length is comparable to that of the fatty acid residues in most phospholipids, and it is conveniently available.

I also need to mention the B2016 study (Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge) since the the cyclohexane/water distribution coefficient was used as a surrogate for gas-to-water transfer free energy in the challenge:

The inclusion of distribution coefficients replaces the previous focus on hydration free energies which was a fixture of the past five challenges (SAMPL0-4) [1 | 2 | 3 | 4 | 5 | 6 | 7]. Due to a lack of ongoing experimental work to generate new data, hydration free energies are no longer a practical property to include in blind challenges. It has become increasingly difficult to find unpublished or obscure hydration free energies and therefore impossible to design a challenge focusing on target compounds, functional groups or chemical classes.

I consider initiatives such as the SAMPL5 cyclohexane/water distribution challenge to be valuable for assessing model predictivity in an objective and transparent manner. Generally, I would avoid including logD measurements for compounds that are significantly ionized under experimental conditions because these require that account be taken of ionization when making predictions (better to measure logD at a pH at which ionizable functional groups are not significantly ionized). While challenges such as SAMPL5 are certainly valuable for assessment of predictivity of models, I consider them less useful in model development which requires measured data for structurally-related compounds. 

The isosteric pairs 1/2  and 3/4 shown in the graphic below will give you an idea of what I'm getting at. The predicted pKBHX values taken from K2016 suggest that 1 is less polar than than its isostere 2 and I'd expect 3 to be more polar than 4.

While the three N-butylated purines shown in the graphic below are not strictly isosteric I would consider it valid to interpret the cyclohexane/water logP values taken from S1998 as reflecting differences in hydrogen bond acceptor strength.

This is a good point at which wrap up and, given the fundamental importance of aqueous solvation in biomolecular recognition and drug design, I see tangible advantages in having a large body of measured data in the public domain. My view is that to measure gas-to-water transfer free energy for significant numbers of compounds of interest to drug discovery scientists would be both technically demanding and unlikely to get funded although I would be delighted to be proven wrong on either point. This means that we need to learn to use other types of data in order to study aqueous solvation and my view is that an alkane/water partition coefficient would be the best option. Using alkane/water partition coefficients as an alternative to gas-to-water transfer free energies for studying aqueous solvation would also enable enthalpic (see RT1984) and volumetric aspects of aqueous solvation to be investigated more easily.