"A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant."
Manfred Eigen (1927 - 2019)
I'll start this blog post with some unsolicited advice to those who seek to transform drug discovery. First, try to understand what a drug needs to do (as opposed to what compound quality 'experts' tell us a drug molecule should look like). Second, try to understand the problems that drug discovery scientists face and the constraints under which they have to solve them. Third, remember that many others have walked this path before and difficulties that you face in gaining acceptance for your ideas may be more a consequence of extravagant claims made previously by others than of a fundamentally Luddite nature of those whom you seek to influence. As has become a habit, I'll include some photos to break the text up a bit and the ones in this post are from Armenia.
Mount Ararat taken from the Cascade in Yerevan. I stayed at the excellent Cascade Hotel which is a two minute walk from the bottom of the Cascade.
Here are a couple of slides from my recent talk at Maynooth University that may be helpful to machine learning evangelists, AI visionaries and computational chemists who may lack familiarity with drug design. The introductions to articles on ligand efficiency and correlation inflation might also be relevant.
Defining controllability of exposure (drug concentration) as a design objective is extremely difficult while unbound intracellular drug concentration is not generally measurable in vivo.
Computational chemists and machine learning evangelists commonly make (at least) one of two mistakes when seeking to make impact on drug design. First, they see design purely as an exercise in prediction. Second, they are unaware of the importance of exposure as the driver of drug action. I believe that we'll need to change (at least) one of these characteristics of drug design if we are to achieve genuine transformation.
In this post, I'm going to take a look at an article in ACS Medchem Letters entitled 'Transforming Computational Drug Discovery with Machine Learning and AI'. The article opens with a Pablo Picasso quote although I'd argue that the observation made by Manfred Eigen at the beginning of the blog post would be way more appropriate. The World Economic Forum (WEF) is quoted as referring to "to the combination of big data and AI as both the fourth
paradigm of science and the fourth industrial revolution". The WEF reference reminded me of an article (published in the same journal and reviewed in this post) that invoked "views obtained from senior medicinal chemistry leaders". However, I shouldn't knock the WEF reference too much since we observed in the correlation inflation article that "lipophilicity is to medicinal chemists what interest rates are to central bankers".
The Temple of Garni is the only Pagan temple in Armenia and is sited next to a deep gorge (about 20 metres behind me). I took a keen interest in the potential photo opportunities presented by two Russian ladies who had climbed the safety barrier and were enthusiastically shooting selfies...
Much of the focus of the article is on the ANI-1x potential (and related potentials), developed by the authors for calculation of molecular energies. These potentials were derived by using a deep neural network to fit calculated (DFT) molecular energies to calculated molecular geometry descriptors. This certainly looks like an interesting and innovative approach to calculating energies of molecular structures. It's also worth mentioning the Open Force Field Initiative since they too are doing some cool stuff. I'll certainly be watching to see how it all turns out.
One key question concerns accuracy of DFT energies. The authors talk about a "zoo" of force fields but I'm guessing the diversity of DFT protocols used by computational chemists may be even greater than the diversity of force fields (here's a useful review). Viewing the DFT field as an outsider, I don't see a clear consensus as to the most appropriate DFT protocol for calculating molecular energy and the lack of consensus appears to be even more marked when considering interactions between molecules. It's also worth remembering that the DFT methods are themselves parameterized.
Potentials such as those described by the authors are examples of what drug discovery scientists would call a quantitative structure-property relationship (QSPR). When assessing whether or not a model constitutes AI in the context of drug discovery, I would suggest consideration of the nature of the model rather than the nature of the algorithm used to build the model. The fitting of DFT energies to molecular descriptors that the authors describe is considerably more sophisticated than would be the case for a traditional QSPR. However, there are a number of things that you need to keep in mind when fitting measured or calculated properties to descriptors regardless of the sophistication of the fitting procedure. This post on QSAR as well as the recent exchange ( 1 | 2 | 3 ) between Pat Walters and me may be informative. First, over-fitting is always a concern and validation procedures may make an optimistic assessment of model quality when the space spanned by descriptors is unevenly covered. Second, it is difficult to build stable and transferable models if there are relationships between descriptors (the traditional way to address this problem is to first perform principal component analysis which assumes that the relationships between descriptors is linear). Third, it is necessary to account for numbers of adjustable parameters in models in an appropriate manner if claiming that one model has outperformed another.
Armenia appeared to be awash with cherry blossoms when I visited in April. This photo was taken at Tatev Monastery which can be accessed by cable car.
The authors have described what looks to be a promising approach to calculation of molecular energies. Is it AI in the context of drug discovery? I would say, "no, or at least no more so than the QSPR and QSAR models that have been around for decades". Will it transform computational drug discovery? I would say, "probably not". Now I realize that you're thinking that I'm a complete Luddite (especially given my blinkered skepticism of the drug design metrics introduced by Pharma's Finest Minds) but I can legitimately claim to have exploited knowledge of ligand conformational energy in a real discovery project. I say "probably not" simply because drug designers have been able to calculate molecular energy for many years although I concede that the SOSOF (same old shit only faster) label would be unfair. That said, I would expect faster, more accurate and more widely applicable methods to calculate molecular energy to prove very useful in computational drug discovery. However, utility is a necessary, but not sufficient, condition for transformation.
Geghard Monastery was carved from the rock
So I'll finish with some advice for those who manage (or, if you prefer, lead) drug discovery. Suppose that you've got some folk trying to sell you an AI-based system for drug design. Start by getting them to articulate their understanding of the problems that you face. If they don't understand your problems then why should you believe their solutions? Look them in the eye when you say "unbound intracellular concentration" to see if you can detect signs of glazing over. In particular, be wary of crude scare tactics such as the suggestion that those medicinal chemists that don't use AI will lose their jobs to medicinal chemists who do use AI. If the terrors of being left behind by the Fourth Industrial Revolution are invoked then consider deploying the conference room furniture that you bought on eBay from Ernst Stavro Blofeld Associates.
Selfie with MiG-21 (apparently Artem's favorite) at the Mikoyan Brothers Museum in Sanahin where the brothers grew up. Anastas was even more famous than his brother and played a key role in defusing the Cuban Missile Crisis.