Tuesday, 1 April 2025

Property Forecast Index Validated

I arrived in Korea on Friday night and am greatly enjoying it here. Photos below show the Jungbu Dried Seafoods Market near where I'm staying and dinner on Sunday (spicy beef noodles).



I visited the War Memorial on Sunday and took selfies with the Shenyang J-6 (Chinese version of MiG-19) 'liberated' by Capt. Lee Woong-pyeong when he defected to South Korea on 25th February 1983, a 'liberated' T-34 (as Uncle Joe is said to have observed, quantity has a quality all of its own) and Great Leader's car (also 'liberated' although it was not clear exactly when). 

So enough of the travel photos for now and let's get back to the science. Regular readers (both of them) of this blog will be well aware of my visceral dislike for drug design metrics. One reason for this visceral dislike is that I consider these metrics to trivialise the problems faced by medicinal chemists and I remain sceptical that one can make meaningful predictions of developability or likelihood of clinical success for compounds based only on their chemical structures without knowing anything about their biological activities. One metric that I have criticised harshly in the past is property forecast index (PFI) which was originally introduced as solubility forecast index (SFI). Specifically, I denounced SFI as a ‘draw pictures and wave arms’ data analysis strategy and privately I even considered the possibility that it had been created by a toddler armed with a box of colored crayons.

Let’s take a look at the HY2010 article in which SFI was introduced. Proprietary aqueous solubility measurements (continuous variable) were first processed to assign compounds to one of three aqueous solubility categories. Histograms showing the proportions of measurements in each aqueous solubility category were created by binning values of SFI and of c log DpH7.4 and the histograms were compared visually:    

This graded bar graph (Figure 9) can be compared with that shown in Figure 6b to show an increase in resolution when considering binned SFI versus binned c log DpH7.4 alone.

Recently, I have been forced to revise my negative view of PFI and I have to admit that it pains me deeply to realise that I could have been so utterly wrong for so long in my assessment of what is actually an elegant and highly-predictive drug design metric. Indeed I have now come to the conclusion that the only reason that the Journal of Medicinal Chemistry did not include PFI in its nomination for the Nobel Prize in Physiology or Medicine was that the introduction of the Ro5, LipE and Fsp3 principles led directly to so many marketed drugs being approved.

What has caused such a fundamental shift in my views? First, PFI is highlighted in the European Federation of Medicinal Chemistry (EFMC) ‘Best Practices from Hits to Lead Generation’ webinar.  Now it goes without saying that EFMC includes some of the sharpest minds in medicinal chemistry and, given that they consider PFI to be sufficiently important for inclusion in a best practices webinar, it became abundantly clear that I needed to revise my hopelessly naïve thinking. Let’s join the webinar at 27:53 and you’ll see in the webinar slide that SFI (as PFI was originally introduced) has been strongly endorsed by Practical Cheminformatics, a blog that many, including me, accept without question as the source of a number of fundamental ground truths in the AI field.

However, what convinced me of the sublime elegance and extreme predictivity of PFI is a seminal study by the world-renowned expert on tetrodotoxin pharmacology, Prof. Angelique Bouchard-Duvalier of the Port-au-Prince Institute of Biogerontology, working in collaboration with the Budapest Enthalpomics Group (BEG). The manuscript has not yet been made publicly available although I was able to access it with the help of my associate ‘Anastasia Nikolaeva’ (not sure exactly what she’s doing these days although she did post a photo from Pyongyang showing her and a burly chap with a toothy grin and a bizarre haircut). There is no doubt that this genuinely disruptive study will comprehensively reshape the predictive ADME landscape, enabling drug discovery scientists, for the very first time, to make accurate predictions for developability and probability of critical trial success using only chemical structures as input.

Prof. Bouchard-Duvalier’s seminal study clearly demonstrates that graphical presentation of categorized continuous data outperforms regression analysis performed on the uncategorized continuous data. The math is truly formidable (my rudimentary understanding of Haitian patois didn’t help either) and involves first projecting the atomic isothermal compressibility matrix into the quadrupole-normalized polarizability tensor before applying the Barone-Samedi transformation, followed by hepatic eigenvalue extraction using an algorithm devised by E. V. Tooms (a reclusive Baltimore resident whose illustrious research career in analytic topology was abruptly halted almost 31 years ago by an unfortunate escalator accident). The incisive analysis of Prof. Bouchard-Duvalier shows without a shadow of doubt that the data visualization used to establish PFI as a fundamental drug design principle will reliably and robustly outperform all AI approaches to prediction of aqueous solubility. Furthermore, ‘Anastasia Nikolaeva’ was also able to ‘liberate’ a prepared press release in which the beaming BEG director Prof. Kígyó Olaj explains that, “Possibilities are limitless now that we can accurately and robustly predict the developability of a compound using only its chemical structure as input and we can now finally consign regression analysis to the dustbin of history. Surely the Editors of Journal of Medicinal Chemistry will recognize the impact of PFI on real world drug discovery when they make their Nobel Prize nominations later this year.”