Wednesday, 29 May 2019

Transforming computational drug discovery (but maybe not)


"A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant."
Manfred Eigen (1927 - 2019)

I'll start this blog post with some unsolicited advice to those who seek to transform drug discovery. First, try to understand what a drug needs to do (as opposed to what compound quality 'experts' tell us a drug molecule should look like). Second, try to understand the problems that drug discovery scientists face and the constraints under which they have to solve them. Third, remember that many others have walked this path before and difficulties that you face in gaining acceptance for your ideas may be more a consequence of extravagant claims made previously by others than of a fundamentally Luddite nature of those whom you seek to influence. As has become a habit, I'll include some photos to break the text up a bit and the ones in this post are from Armenia.

Mount Ararat taken from the Cascade in Yerevan. I stayed at the excellent Cascade Hotel which is a two minute walk from the bottom of the Cascade.

Here are a couple of slides from my recent talk at Maynooth University that may be helpful to machine learning evangelists, AI visionaries and computational chemists who may lack familiarity with drug design. The introductions to articles on ligand efficiency and correlation inflation might also be relevant.

Defining controllability of exposure (drug concentration) as a design objective is extremely difficult while unbound intracellular drug concentration is not generally measurable in vivo.



Computational chemists and machine learning evangelists commonly make (at least) one of two mistakes when seeking to make impact on drug design. First, they see design purely as an exercise in prediction. Second, they are unaware of the importance of exposure as the driver of drug action. I believe that we'll need to change (at least) one of these characteristics of drug design if we are to achieve genuine transformation.

In this post, I'm going to take a look at an article in ACS Medchem Letters entitled 'Transforming Computational Drug Discovery with Machine Learning and AI'. The article opens with a Pablo Picasso quote although I'd argue that the observation made by Manfred Eigen at the beginning of the blog post would be way more appropriate. The World Economic Forum (WEF) is quoted as referring to "to the combination of big data and AI as both the fourth paradigm of science and the fourth industrial revolution". The WEF reference reminded me of an article (published in the same journal and reviewed in this post) that invoked "views obtained from senior medicinal chemistry leaders". However, I shouldn't knock the WEF reference too much since we observed in the correlation inflation article that "lipophilicity is to medicinal chemists what interest rates are to central bankers".

The Temple of Garni is the only Pagan temple in Armenia and is sited next to a deep gorge (about 20 metres behind me). I took a keen interest in the potential photo opportunities presented by two Russian ladies who had climbed the safety barrier and were enthusiastically shooting selfies...

Much of the focus of the article is on the ANI-1x potential (and related potentials), developed by the authors for calculation of molecular energies. These potentials were derived by using a deep neural network to fit calculated (DFT) molecular energies to calculated molecular geometry descriptors. This certainly looks like an interesting and innovative approach to calculating energies of molecular structures. It's also worth mentioning the Open Force Field Initiative since they too are doing some cool stuff. I'll certainly be watching to see how it all turns out.

One key question concerns accuracy of DFT energies. The authors talk about a "zoo" of force fields but I'm guessing the diversity of DFT protocols used by computational chemists may be even greater than the diversity of force fields (here's a useful review). Viewing the DFT field as an outsider, I don't see a clear consensus as to the most appropriate DFT protocol for calculating molecular energy and the lack of consensus appears to be even more marked when considering interactions between molecules. It's also worth remembering that the DFT methods are themselves parameterized.  

Potentials such as those described by the authors are examples of what drug discovery scientists would call a quantitative structure-property relationship (QSPR). When assessing whether or not a model constitutes AI in the context of drug discovery, I would suggest consideration of the nature of the model rather than the nature of the algorithm used to build the model. The fitting of DFT energies to molecular descriptors that the authors describe is considerably more sophisticated than would be the case for a traditional QSPR. However, there are a number of things that you need to keep in mind when fitting measured or calculated properties to descriptors regardless of the sophistication of the fitting procedure. This post on QSAR as well as the recent exchange ( 1 | 2 | 3 ) between Pat Walters and me may be informative. First, over-fitting is always a concern and validation procedures may make an optimistic assessment of model quality when the space spanned by descriptors is unevenly covered. Second, it is difficult to build stable and transferable models if there are relationships between descriptors (the traditional way to address this problem is to first perform principal component analysis which assumes that the relationships between descriptors is linear). Third, it is necessary to account for numbers of adjustable parameters in models in an appropriate manner if claiming that one model has outperformed another.



Armenia appeared to be awash with cherry blossoms when I visited in April. This photo was taken at Tatev Monastery which can be accessed by cable car.

The authors have described what looks to be a promising approach to calculation of molecular energies. Is it AI in the context of drug discovery? I would say, "no, or at least no more so than the QSPR and QSAR models that have been around for decades". Will it transform computational drug discovery? I would say, "probably not". Now I realize that you're thinking that I'm a complete Luddite (especially given my blinkered skepticism of the drug design metrics introduced by Pharma's Finest Minds) but I can legitimately claim to have exploited knowledge of ligand conformational energy in a real discovery project. I say "probably not" simply because drug designers have been able to calculate molecular energy for many years although I concede that the SOSOF (same old shit only faster) label would be unfair. That said, I would expect faster, more accurate and more widely applicable methods to calculate molecular energy to prove very useful in computational drug discovery. However, utility is a necessary, but not sufficient, condition for transformation.


Geghard Monastery was carved from the rock

So I'll finish with some advice for those who manage (or, if you prefer, lead) drug discovery.  Suppose that you've got some folk trying to sell you an AI-based system for drug design. Start by getting them to articulate their understanding of the problems that you face. If they don't understand your problems then why should you believe their solutions? Look them in the eye when you say "unbound intracellular concentration" to see if you can detect signs of glazing over. In particular, be wary of crude scare tactics such as the suggestion that those medicinal chemists that don't use AI will lose their jobs to medicinal chemists who do use AI. If the terrors of being left behind by the Fourth Industrial Revolution are invoked then consider deploying the conference room furniture that you bought on eBay from Ernst Stavro Blofeld Associates.

Selfie with MiG-21 (apparently Artem's favorite) at the Mikoyan Brothers Museum in Sanahin where the brothers grew up. Anastas was even more famous than his brother and played a key role in defusing the Cuban Missile Crisis.

Saturday, 11 May 2019

Efficient trajectories


I'll examine an article entitled ‘Mapping the Efficiency and Physicochemical Trajectories of Successful Optimizations’ (YL2018) in this post and I should note that the article title reminded me that abseiling has been described as the second fastest way down the mountain. The orchids in Blanchisseuse have been particularly good this year and I’ll include some photos of them to break the text up a bit.


It’s been almost 22 years since the rule of 5 (Ro5) was published. While the Ro5 article highlighted molecular size and lipophilicity as pharmaceutical risk factors, the rule itself is actually of limited utility as a drug design tool. Some of the problems associated with excessive lipophilicity had actually been recognized (see Yalkowsky | Hansch) over a decade before the publication of Ro5 in 1997 and there’s also this article that had been published in the previous year. However, it was the emergence of high-throughput screening that can be regarded as the trigger for Ro5 which, in turn, dramatically raised awareness of the importance of physicochemical properties in drug design. The heavy citation and wide acceptance of Ro5 provided incentives for researchers to publish their own respective analyses of large (usually proprietary) data sets and this has been expressed more succinctly as “Ro5 envy”.



So let's take a look at YL2018 and the trajectories. I have to concede that ‘trajectory’ makes it all seem so physical and scientifically rigorous even though ‘path’ would be more appropriate (and easier to say after a few beers). As noted in ‘The nature of ligand efficiency’ (NoLE), I certainly believe that it is a good idea for medicinal chemistry teams to both plot potency (e.g. pIC50) against risk factors such as molecular size or lipophilicity for their project compounds and to analyze the relationships between potency and these quantities. However, it is far from clear that a medicinal chemistry team optimizing a specific structural series against a particular target would necessarily find the plots corresponding to optimization of other structural series against other targets to be especially relevant to their own project.

YL2018 claims that “the wider employment of efficiency metrics and lipophilicity control is evident in contemporary practice and the impact on quality demonstrable”. While I would agree that efficiency metrics are integral to the philatelic aspects of modern drug discovery, I don’t believe that YL2018 actually presents a single convincing example of efficiency metrics being used for decision making in a specific drug design project. I should also point out that each of the authors of YL2018 provided cannon fodder (LS2007 | HY2010 ) for the correlation inflation article and you might want to keep that in mind when you read the words “evident” and “demonstrable”. They also published 'Molecular Property Design: Does Everyone Get It?' back in 2015 and you may find this review of that seminal contribution to the drug design literature to be informative.

I reckon that it would actually be a lot more difficult to demonstrate that efficiency metrics were used meaningfully (i.e. for decision making rather than presentation at dog and pony shows) in projects than it would be to demonstrate that they were predictive of pharmaceutically relevant behavior of compounds. In NoLE, I stated:

"However, a depiction [6] of an optimization path for a project that has achieved a satisfactory endpoint is not direct evidence that consideration of molecular size or lipophilicity made a significant contribution toward achieving that endpoint. Furthermore, explicit consideration of lipophilicity and molecular size in design does not mean that efficiency metrics were actually used for this purpose. Design decisions in lead optimization are typically supported by assays for a range of properties such as solubility, permeability, metabolic stability and off-target activity as well as pharmacokinetic studies. This makes it difficult to assess the extent to which efficiency metrics have actually been used to make decisions in specific projects, especially given the proprietary nature of much project-related data."



YL2018 states, “Trajectory mapping, based on principles rather than rules, is useful in assessing quality and progress in optimizations while benchmarking against competitors and assessing property-dependent risks.” and, as a general point, you need to show you're on top of the physical chemistry if you're going write articles like this.

Ligand efficiency represents something of a liability for anybody claiming expertise in physical chemistry. The reason for this is that perception of efficiency depends on the unit that you use to express affinity and this is a serious issue (in the "not even wrong" category) that was highlighted in 2009 and 2014 before NoLE was published. While YL2018 acknowledges that criticisms of ligand efficiency have been made, you really need to say exactly why this dependence of perception is not a problem if you're going lecture about principles to readers of Journal of Medicinal Chemistry.

Ligand lipophilic efficiency (LLE) which is also known as ligand lipophilicity efficiency (LLE) and lipophilic efficiency (LipE) can be described as offset efficiency metric (lipophilicity is subtracted from potency). As such, perception of efficiency does not change when you use a different unit to express potency and, provided that ionization of ligand is insignificant, efficiency can be seen as a measure of the ease of transfer of ligand from octanol to its binding site. Here's a graphic that illustrates this:

LLE (LipE) measures ease of transfer of ligand from octanol to binding site

I'm not entirely convinced that the authors of YL2018 properly understood the difference between logP and logD. Even if they did, they needed to articulate the implications for drug design a lot more clearly than they have done. Here's an equation that expresses logD as a function of logP and the fraction of ligand in the neutral form at the experimental pH (assuming that only neutral forms of ligands partition into the octanol).


The equation highlights the problems that result from using logD (rather than logP) to define "compound quality". In essence the difficulty stems from the composite nature of logD which means that logD can be also be reduced by increasing the extent of ionization. While this is likely to result in increased aqueous solubility, it is much less likely that problems associated with binding to anti-targets will be addressed. Increasing the extent of ionization may also compromise permeability.    


YL2018 is clearly a long article and I'm going to focus on two of the ways in which the authors present values of efficiency metrics. The first of these is the "% better" statistic which is used to reference specific compounds (e.g. optimization endpoints) to sets of compounds (e.g. everything synthesized by project chemists). The statistic is calculated as the fraction of compounds in the set for which both LE and LLE values are greater than the corresponding values for the compound of interest. The smallest values of the "% better" statistic are considered to correspond to the most optimal compounds. The use of the "% better" statistic could be taken as indicating that absolute thresholds for LE and LLE are not useful for analyzing optimization trajectories..

The fundamental problem with analyzing data in this manner is that LE has a nontrivial dependence on the concentration unit in which affinity is expressed (this is shown in Table 1 and Fig. 1 in NoLE). One consequence of this nontrivial dependence is that both perception of efficiency and the "% better" statistic vary with the concentration unit used to express efficiency.

The second way that the authors of YL2018 present values of efficiency metrics is to plot LE against LLE and, as has already been noted, this is a particularly bone-headed way to analyze data. One problem is that the plot changes in a nontrivial manner if you express affinity in a different unit. This makes it difficult to explain to medicinal chemists why they need to convert the micromolar potencies from their project database to molar units in order for The Truth to be revealed. Another problem is that LE and LLE are both linear functions of pIC50 (or pKi) and that means that the appearance of the plot is heavily influenced by the (trivial) correlation of potency with itself.

A much better way to present the data is to plot LLE against number of non-hydrogen atoms (or any other measure of molecular size that you might prefer). In such a plot, expressing potency (or affinity) in a different unit simply shifts all points 'up' or 'down' to the same extent which means that you no longer have the problem that the appearance of plot changes when you change units. The other advantage of plotting the data in this manner is that there is no explicit correlation between the quantities being plotted. I have used a variant of this plot in NoLE (see Fig. 2b) to compare some fragment to lead optimizations that had been analyzed previously.

I think this is a good point to wrap things up. Even if you have found the post to be tedious, I hope that you have at least enjoyed the orchids. As we would say in Brazil, até mais!

    

Tuesday, 9 April 2019

A couple of talks on Figshare


I've started using Figshare and have uploaded my first two harangues. Late last month, I did a talk on ligand efficiency at Evotec UK that was based on my recent article in Journal of Cheminformatics (which had caused two of the reviewers to spit feathers when the manuscript was initially submitted to Journal of Medicinal Chemistry). Last week, I travelled to Ireland to seek an audience with with Maynooth University's renowned Library Cat (see photo below). Seeing as I was over there anyway, I also gave a talk (Hydrogen bonding in context of drug design) that drew on material from two articles (1 | 2) on hydrogen bonding from our work in Nequimed

Monday, 1 April 2019

Enthalpy-driven pharmacokinetics


Drug design is a multi-objective endeavor. Some objectives such as maximization of affinity against target(s) and minimization of affinity against anti-targets are easily defined. Other objectives such as controllability of exposure are much less easily defined and this means that drug design is indirect. Controllability of exposure is the focus of pharmacokinetic optimization and I recently became aware of an exciting new development that will surely reshape the pharmacokinetic field and transform drug discovery beyond all recognition.

Target engagement potential and the multiple objectives of drug design 

The exciting results from another seminal study by the Budapest Enthalpomics Group (BEG) look set to revolutionize the way that we think about pharmacokinetics. The work, funded by Mothers Against Molecular Obesity (MAMO), was described in a book chapter that was prematurely posted online although the error does appear to have been recognized because the article is no longer publicly visible. In a study that will undoubtedly disrupt drug discovery, it is clearly shown that the thermodynamic signature for the binding of a ligand to a protein is predictive of the physicochemical behavior of the ligand even when that protein is absent from the system.

The theoretical treatment introduced in this groundbreaking study is formidable and the starting point is is an eigenvalue decomposition of the entropic field tensor in reciprocal heavy atom space. Machine learning using the Blofeld Optimized Ligand Lipophilicity Of Cryogenic Krypton Solvates algorithm demonstrates unequivocally that the Grayling annihilation operator can be used to eliminate the entropy (and its efficacy-limiting dependence on the definition of the standard state) from any in vivo system. This leads to highly-efficient, enthalpy-driven pharmacokinetics in which the clearance (shown to be strongly correlated with the trace of the entropic field tensor) can be significantly attenuated. "The key to successful pharmacokinetic optimization is to eliminate the elimination", explains institute director Prof Kígyó Olaj, "and we have shown, for the first time, that entropy can be exorcised from the equations of pharmacokinetics with even greater efficiency than if it had been done by Torquemada himself."

Friday, 8 February 2019

The nature of ligand efficiency


"First they ignore you, then they laugh at you
then they fight you, then you win"
(Sometimes attributed to Mahatma Gandhi)

It's been a bit of a slog but the 'The nature of ligand efficiency' (NoLE) has just been published in Journal of Cheminformatics. As detailed in the previous post, the manuscript proved too spicy for two of the reviewers assigned by Journal of Medicinal Chemistry who did seem rather keen that the readers of that journal not be exposed to the meaninglessness of the ligand efficiency metric. One of the great things about preprints is that we no longer need to take shit from reviewers and, after spicing up the manuscript a bit more, I uploaded it as my first contribution to ChemRxiv. The exercise did teach me a number of lessons that should serve me well when I get round to writing 'The nature of lipophilic efficiency'...

Proofs for NoLE on my desk at Berwick-on-Sea in Blanchisseuse


I believe that NoLE may be the second scientific publication from the village of Blanchisseuse on the north coast of Trinidad (although I'll be happily be proven wrong on this point) and it follows an article in which I gave the editors of a number of ACS journals some unsolicited advice on assay interference. Sir Solomon Hochoy, who was Governor General of Trinidad and Tobago for ten years after independence, grew up in Blanchisseuse and had a house at the far end of the beach where I swim. I met him on the beach a couple of times before he passed away in 1983 and, after that, I would often chat with Lady Thelma who was an expert in the use of the cast net (she said that it was the only way that her cats would eat and, evidently, they were either very numerous or extremely large). I suspect that she would have been able to teach us a thing or two about high throughput screening although I never summoned the courage to ask her exactly how many cats she had.


Steps leading to the Hochoy house

This part of of Blanchisseuse has a long been associated with picong being given in print and my two articles merely follow an established tradition. The journalist BC Pires is married to one of our next door neighbors and he achieved international notoriety for a 1994 column in which he gently poked fun at the ears of a touring England fast bowler:

"He's a bowler with bat ears. How can he bowl fast with ears like that? The wind resistance must be a bitch to overcome, which probably accounts for the odd look of his first four steps. If he had his ears tucked, he'd outpace Ambrose. A less determined man would have opted to bowl spin. I'm amazed that he can walk head-on into the wind, far less run in at a seamer's speed. If, just as he reached the end of his run-up, he tripped at the bowling crease, he would probably glide to the other wicket. He could stump the batsman himself off his own delivery. He could deliver his delivery. He's a gentleman though; he's the only bowler I know who walks with his own sight screen attached to his head."

Needless to say BC was forced to make a grovelling apology and, if forced to make a grovelling apology for any of my scientific articles, I shall certainly consult my old schoolmate who is a (the?) world-leading expert and thought leader in the making of grovelling apologies.

"But Mr Caddick has demanded an apology and so I feel that I must, with utmost sincerity, say that I am very sorry indeed that Andy Caddick has big ears."

"Talk to me, Basil, I'm all ears" (with BC in Barbados)

Like both BC and I, my late father was taught by the Holy Ghost Fathers and the title of his final book was inspired by a rather bizarre episode that had taken place a couple of years before BC was accosted in the commentary box at Bourda by a fast bowler who is 20 cm taller than me (take another look at the photo above as you try to imagine that scene). For decades, a weather vane in the form of a sea serpent had sat harmlessly on the roof of the Red House which is currently being renovated but at the time, housed Parliament. Some years previously, the calypsonian Sugar Aloes had sung that the sea serpent (commonly believed to be a dragon) was a evil omen and, when the People's National Movement (PNM) were re-elected, it was decided that the dragon simply had to go.

The dragon was duly replaced with a white dove in a nocturnal operation that was personally supervised by the Minister for Works. The reason given for installing the dove at night was that they wanted to minimize disruption of traffic and this must be the only occasion on which a government in Trinidad and Tobago has been concerned about disrupting traffic. Although some considered the dove (with an olive branch in its beak) to be masterpiece, others were unconvinced. In particular, there was something that just didn't look right. It was my late father, Professor of Zoology at the University of the West Indies, who identified the problem in a letter to the Daily Express.

"Examination of your photograph shows an aerofoil of mixed parentage. The inner segment resembles that of any small bird with a generalised wing such as a dove a keskidee. The outer segment with its greater width is clearly that of a soaring bird such as a corbeau.

In flight a bird's tail feathers would trail and not be partly spread. If however it was landing the tail fan would be widely spread and the long axis of its body inclined sharply upward to the flight path. In flight, no bird would spread its legs in this way but would trail them. The bird in photograph is clearly not flying or landing normally. 

Zoologically the only circumstances under which this configuration of spread legs, partially spread tail fan and spread primaries is possible is when a bird defecates in flight."            

Masters of picong

No thought leaders, key opinions, cricketers or avifauna were harmed during the production of this blog post.

Sunday, 27 January 2019

Reviewing the reviewers


I recently published The Nature of Ligand Efficiency (NoLE) as a ChemRxiv preprint and this was featured (for all the right reasons) in a post at In The Pipeline. The material had been previously submitted to J Med Chem but it proved a bit too spicy for two of the three reviewers. I'll review the J Med Chem reviewers in this blog post and I hope that the feedback will be useful in the event of the journal being presented with similarly flavored material in the future. NoLE was my second publication from Berwick-on-Sea in the village of Blanchisseuse on the north coast of my native Trinidad and I'll include some photos from there to break up the text a bit.



Gate at Berwick-on-Sea in Blanchisseuse. The house was built (quite literally) by my late father (who would have been 89 today) and was named for my mother's home town of Berwick-upon-Tweed which has changed hands between England and Scotland on a number of occasions and may even still be at war with Imperial Russia.

The selection of reviewers for manuscripts that criticize previous studies presents a dilemma for journal editors. While it is prudent to consult those with a stake in what is being criticized, these may not the best people to ask about whether or not the criticism should be made. In particular, a reviewer using his/her position as a reviewer to suppress criticism of something in which he/she has a stake raises ethical questions. A stake in ligand efficiency (LE) could take any of a number of forms. First, one could have introduced a metric for LE. Second, one could have written articles endorsing ligand efficiency metrics or asserting their validity. Third, one could have enthusiastically promoted the LE metric at one's institution (e.g. by mandating that LE values be quoted when presenting project updates at the dog and pony shows that are an essential part of modern drug discovery). Fourth, one might be a devout member of the Fragment Cult (for whom the Doctrinal Correctness of LE is an Article of Faith).

There were three reviewers for my manuscript and I'll call them A, B and C since their numbers got scrambled between different rounds of review (also using the term 'Reviewer 3' might give some readers anxiety attacks). Reviewer A had nothing constructive to say and simply spat feathers. Reviewer B was very positive about the manuscript and made a number of  helpful suggestions. Reviewer C demanded that the manuscript be watered down to homeopathic levels (and that was never going to happen).

Here's my office at Berwick-on-Sea. That's a printout of NoLE on my desk (under the hanging beach towel).

The central theme of my manuscript is the argument that ligand efficiency is physically meaningless because perception of efficiency changes with the concentration unit in which affinity is expressed. This is actually a very serious criticism since since a change in perception resulting from a change in a unit would normally be regarded in physical science as an error in the "not even wrong" category.  It's not something that one can simply sweep under the carpet as a "limitation" of ligand efficiency. Despite their howls of protest, neither Reviewer A nor Reviewer C offered coherent counter-argument.

The tactic adopted by Reviewer C was to simply dismiss the physical arguments presented in the manuscript as "opinion" without presenting counter-argument. J Med Chem really does need to make it clear to reviewers that they need to do much better than this since it reflects badly on the journal.

Reviewer C. "'Physically meaningless' is at best an inflammatory opinion whereas the fact that other choices could have been made is often under-appreciated."
PWK. This criticism appears to be doctrinal rather than scientific and I note that Reviewer C has not offered counter-argument to the argument that LE is physically meaningless.


Here's a view of the Caribbean Sea. The 20 m drop from the gap in the vegetation is just as precipitous as you would expect although we've not (yet) lost any personnel or household pets over the edge.

Reviewer A struggled woefully with rudimentary physical chemistry throughout the review process and, given that I'd suggested a number of potential reviewers with the necessary expertise in molecular recognition and chemical thermodynamics, I was at a loss to understand why a reviewer who was so ill-equipped for the task at hand had been invited to review the manuscript.

Reviewer A. Reactions are considered to be spontaneous under standard conditions when the free energy is negative, but by changing the definition of C° in an arbitrary manner, any reaction can be said to be spontaneous or not. This is true in a trivial sense, but generations of researchers have found the concept of negative or positive free energies useful.
PWK. The flaw in this argument is that if you change the value of C° then you also change whether or not the reaction is spontaneous under the standard conditions. This is the basis of the law of mass action and it is also important to remember that KD values are not measured at single concentration. A chemical process (at constant temperature and pressure) by which the system changes from state A to state B will be spontaneous if DG[A®B]  is negative. Regardless of experiences of generations of researchers, medicinal chemists rarely (if ever) appear to use the sign of  D (e.g. for binding under assay conditions) when analyzing SAR or for making any other decisions.

This is the start to the path down to the lower deck

In one round of review, Reviewer C stated “I believe that it is incumbent on the author to argue that the choice of standard state used by medicinal chemists is not useful” and Reviewer A repeated the criticism in a subsequent round, noting that this was "the central problem with the manuscript". I thought this was a bit rich given that Reviewer A and Reviewer C had each accused me of using straw man tactics at different points in the review process. The more serious problem, however, is that we have two LE advocates each attempting to to transfer the burden of proof that (in science) one accepts as soon as one advocates that people take an action (e.g. use LE metrics). Reviewers A and C appeared to do this in order to evade their responsibility as reviewers to present counter-argument to the arguments in the manuscript. This would be like a thought leader (yes, there really are people who call themselves 'thought leaders') responding to criticism of a claim that AI was going to transform drug discovery by saying that it was incumbent on the critics to argue that AI was not useful. Imagine if they ran clinical trials like this?

At this point, Reviewer A did rather lose it and I was half expecting to have to fend off a counterattack by Steiner's division. Needless to say, the latest version of the manuscript now opens with "Ligand efficiency (LE) is, in essence, a good concept that is poorly served by a bad metric." and this can be considered the equivalent of a two-fingered gesture that is mistakenly attributed to the English and Welsh longbowmen at Agincourt.

Reviewer A. Dr. Kenny dodges this challenge by stating that the burden of proof should not be on him, but by arguing that LE is a “bad metric” despite its wide usage, he does in fact have to explain why free energy is also a “bad” concept. Not doing so makes the manuscript deeply misleading and therefore inappropriate for publication.
PWK. I only used the term “bad metric” in the conclusions where I wrote “Ligand efficiency is, in essence, a good concept served by a bad metric.” so it is incorrect to state that I have argued that LE is a “bad metric”. In any case, in the revised manuscript, I now question whether LE can accurately be described as a metric since neither its creators nor its advocates appear able (or willing) to say what it measures. Wide usage does not validate rules, guidelines or metrics and I note that, at one time, the prevailing view was that the sun orbited the earth. Once again, Reviewer A is making the serious error of assuming that everything that applies to free energy also applies to any function of free energy. The simple counter to Reviewer A’s challenge is that free energy is a state function and an integral part of the framework of thermodynamics. Although defined in terms of free energy, the LE metric is not is part of thermodynamics simply because it appears to require a privileged standard state.

I have occasionally stated that "useful is the last refuge of the scoundrel" and this tends to be misinterpreted as an assertion that utility of a model is unimportant. Nothing could actually be further from the truth and the statement is more a comment on the way that models can be 'validated' by simply labeling them as "useful". In some ways "useful" is analogous to the "God created it that way" statements that you will encounter if you are careless enough to become ensnared in arguments with Creationists. I should also point out that the manuscript did discuss the difficulties of demonstrating the utility of LE while neither A nor C presented any evidence (fervent belief does not usually constitute evidence in science) to support their assertion that the 1 M standard state is more useful than any other standard state.

Reviewer A appeared particularly aggrieved that one of The Great Unwashed should have the temerity to even question the value of LE and the toys were duly ejected from the pram. As my response below indicates, Reviewer A's comment is more what one might have expected from an inquisitor at a fifteenth century heresy trial than from an expert reviewer of a manuscript submitted to the premier medicinal chemistry journal. It is also worth pointing out that LE was touted as "useful" even as it was introduced in a 2004 letter to Drug Discovery Today and all three coauthors of that seminal contribution to the medicinal chemistry literature appeared to be blissfully unaware of the nontrivial dependency of their creation on the standard concentration. As such, I would argue that it would actually be a dereliction of duty not to question the utility of LE.

Reviewer A. Sixth, Dr. Kenny repeatedly questions the utility of LE; for example “The LE metric is claimed by advocates to be useful although it is rarely, if ever, shown to be predictive of pharmaceutically-relevant behavior” (p. 15) and “the LE metric is rarely, if ever, shown to be predictive of phenomena that are relevant to drug discovery” (p. 39).
PWK. This appears to be a doctrinal rather than scientific criticism.

Lower deck. I only swim from here if snorkeling because it's rocky.

Reviewer B was very positive about the "Molecular Size and Design Risk" section and made useful suggestions for its expansion. It's also worth mentioning that Derek quoted from this section in his post. However, Reviewer C suggested that the whole section be purged from the manuscript although it is possible that Reviewer C's underlying objective was to ensure that certain articles were not discussed. Reviewer C complained that my criticism of ref 48 was unfair although it may be that the reviewer considered ref 48 to be a liability (this post will give readers an idea why some LE advocates might consider ref 48 to be a liability). Another possibility is that the objection to criticism of ref 48 was actually a smokescreen and the real reason for suggesting that the section be purged was actually to avoid discussion of ref 45 (which might be considered to be an even greater liability by LE advocates).

Ref 58 and ref 59 are rare examples of articles that respond to criticism of LE and and a study such as NoLE really does need to discuss them (especially since both articles completely miss the point). The fundamental flaw that is common to both articles is that neither addresses the problems associated with the change in perception that results from using a different unit to express affinity. Reviewer C protested that it was gratuitous to single out ref 58 and even cited this 2014 post from Molecular Design in support of the charge that I was unfairly picking on ref 58. Reviewer C did seem rather rattled and also complained that I had quoted "non-scientific sections" of ref 59. I must confess to being unfamiliar with the concept that a scientific article can have non-scientific sections that can be declared off-limits for challenge. This was, perhaps, not Reviewer C's finest moment.

Reviewer A and Reviewer C both seemed rather keen that ref 94 not be discussed and they said that I should not be "attacking" fit quality (FQ) because it is rarely, if ever, used. I suspect the real reason was that both reviewers consider the metric (and ref 94) to be a significant liability from the LE perspective. I responded by noting that FQ had got its own box in the NRDD LE review and that ref 94 was cited in ref 58 (which asserts the validity of LE), suggesting that FQ may be of greater interest than Reviewer A and Reviewer C would have us believe. Another reason that Reviewer C might have preferred that the spotlight not be focused on FQ is that the discussion further exposes the illusion that fragments bind more efficiently than ligands of greater molecular size.

This is where I go swimming. It's a 5 minute walk from the house

So that concludes my review of the reviewers. I believe that the J Med Chem editors do need to think carefully about how (or even whether) they wish to have controversial topics addressed in their journal. Dr Eric Williams, the first Prime Minister of Trinidad and Tobago, suggested that his hearing impairment was an advantage in dealing with dissent because he could simply switch off his hearing aid. However, dealing with controversial topics in drug discovery might not be quite so simple. In particular, a journal needs to consider the potential vested interests of those from whom it seeks advice. For example, the Editors of a number of ACS journals may find it quite instructive to take a very close look at exactly how their journals came to endorse a frequent hitter model (trained on results from a panel of only six assays that all use the same readout) as a predictor of pan-assay interference...

I'll leave you with a selfie taken on the roof. A few minutes earlier I'd seen off a determined counter-attack by some jack spaniards (or should that be jacks spaniard?). Normally, I'd leave them alone but they were too close to where I needed to work. The technique is simple but its execution takes some nerve. First, arm yourself with a can of Baygon (don't forget to test it beforehand) and a broom. Second, with Baygon aimed, prod nest with broom. Third, spray a protective curtain of Baygon as the jack spaniards attack you (they are aggressive and they always attack). 

PWK one, jack spaniards nil 


Monday, 21 January 2019

Response to Pat Walters on ML in drug discovery

Thanks again for your response, Pat, and I’ll try to both clarify my previous comments and respond to the challenges that you’ve presented (my comments are in red italics).

In defining ML as “a relatively well-defined subfield of AI” I was simply attempting to establish the scope of the discussion. I wasn’t implying that every technique used to model relationships between chemical structure and physical or biological properties is ML or AI.

[As a general point, it may be helpful to say what differentiates ML from other methods (e.g. partial least squares) that have been used for decades for modeling multivariate data in drug discovery. Should CoMFA be regarded as ML? If not, why not?]

You make the assertion that ML may be better for classification than regression, but don't explain why: "I also have a suspicion that some of the ML approaches touted for drug design may be better suited for dealing with responses that are categorical (e.g. pIC50 > 6 ) rather than continuous (e.g. pIC50 = 6.6)"

[My suspicions are aroused when I see articles like this in which the authors say “QSAR” but use a categorical definition of activity. At very least, I think modelers do need to justify the application of categorical methods to continuous data rather than presenting it fait accompli. J Med Chem addresses the categorization of continuous data in section 8g of the guidelines for authors.]

In my experience, the choice of regression vs classification is often dictated by the data rather than the method. If you have a dataset with 3-fold error and one log of dynamic range, you probably shouldn’t be doing regression. If you have a dataset that spans a reasonable dynamic range and isn’t, as you point out, bunched up at the ends of the distribution, you may be able to build a regression model.

[The trend that one is likely to observe in such a data set is likely to be very weak and I would still generally start with regression analysis because this shows the weakness in the trend clearly. The 3-fold error doesn’t magically disappear when you transform the continuous data to make it categorical (it translates to uncertainty in the categorization). Categorization of a data set like this may be justified if the distribution of the data suggests that it is highly clustered.]

Your argument about the number of parameters is interesting: "One of my concerns with cheminformatic ML is that it is not always clear how many parameters have been used to build the models (I’m guessing that, sometimes, even the modelers don’t know) and one does need to account for numbers of parameters if claiming that one model has outperformed another."

I think this one is a bit more tricky than it appears. In classical QSAR, many people use a calculated LogP. Is this one parameter? There were scores of fragment contributions and dozens of fudge factors that went into the LogP calculation, how do we account for these? Then again, the LogP parameters aren't adjustable in the QSAR model. I need to ponder the parameter question and how it applies to ML models which use things like regularization and early stopping to prevent overfitting.

[I would say that logP, whether calculated or measured, is a descriptor, rather than a parameter, in the context of QSAR (and ML) and that the model-building process does not ‘see’ the ‘guts’ of the logP prediction. In a multiple linear regression model (like a classical Hansch QSAR) there will be a single parameter (e.g. a1*logP) associated with logP. However, models that are non-linear with respect to logP will have more than one parameter associated with logP (e.g. a1*logP + a2*logP^2). In some cases, the model may appear to have a huge number of parameters although this may be an illusion because some methods for modeling do not allow the parameters to be varied independently of each other during the fitting process. The term ‘degrees of freedom’ is used in classical regression analysis to denote the number of parameters in a model (I don’t know if there is an analogous term for ML models).

As noted in my original post, the number of parameters used by ML models is not usually accounted for. Provided that the model satisfies validation criteria, the number of parameters is effectively treated as irrelevant. My view is that, unless the number of fitting parameters can be accounted for, it is not valid to claim that one model has outperformed another.]

I’m not sure I understand your arguments regarding chemical space. You conclude with the statement: “It is typically difficult to perceive structural relationships between compounds using models based on generic molecular descriptors”.

[I wasn’t nearly as clear here as I should have been. I meant molecular descriptors that are continuous-valued and define the dimensions of a space. By “generic” I mean descriptors that are defined for any molecular structure which has advantages (generality) and disadvantages (difficult to interpret models).  SAR can be seen in terms of structural relationships (e.g. X is the aza-substituted analog of Y) between compounds and the affinity differences that correspond to those relationships. What I was getting at is that it is difficult to perceive SAR using generic molecular descriptors (as defined above).] 

Validation is a lot harder than it looks. Our datasets tend to contain a great deal of hidden bias. There is a great paper from the folks at Atomwise that goes into detail on this and provides some suggestions on how to measure this bias and to construct training and test sets that limit the bias.

[I completely agree that validation is a lot harder than it looks and there is plenty of scope for debate about the different causes of the difficulty. I get uncomfortable when people declare models to be validated according to (what they claim are) best practices and suggest that the models should be used for regulatory purposes. I seem to remember sending an email to the vice chair of the 2005 or 2007 CADD GRC suggesting a session on model validation although there was little interest at the time. At EuroQSAR 2010, I suggested to the panel that the scientific committee should consider model validation as a topic for EuroQSAR 2012. The panel got a bit distracted by another point and, after I was sufficiently uncouth as make the point again, one of the panel declared that validation was a solved problem.]

I have to disagree with the statement that starts your penultimate paragraph: “While I do not think that ML models are likely to have significant impact for prediction of activity against primary targets in drug discovery projects, they do have more potential for prediction of physicochemical properties and off-target activity (for which measured data are likely to be available for a wider range of chemotypes than is the case for the primary project targets).”

Lead optimization projects where we are optimizing potency against a primary target are often places where ML models can make a significant impact. Once we’re into a lead-opt effort, we typically have a large amount of high-quality data, and can often identify sets of molecules with a consistent binding mode. In many cases, we are interpolating rather than extrapolating. These are situations where an ML model can shine. In addition, we are never simply optimizing activity against a primary target. We are simultaneously optimizing multiple parameters. In a lead optimization program, an ML model can help you to predict whether the change you are making to optimize a PK liability will enable you to maintain the primary target activity. This said, your ML model will be limited by the dynamic range of the observed data. The ML model won't predict a single digit nM compound if it has only seen uM compounds.

[I see LO as a process of SAR exploration and would not generally expect an ML model to predict the effects on affinity of forming new interactions and scaffold hops. While I would be confident that the affinity data for an LO project could be modelled, I am much less confident that hat the models will be useful in design. My guess is that, in order to have significant impact in LO, models for prediction of affinity will need to be specific to the structural series that the LO team is working on. Simple models (e.g. plot of affinity against logP) can be useful for defining the trend in the data which, in turn, allows us to quantify the extent to which to which the affinity of a compound beats the trend in the data (this is discussed in more detail in the Nature of Ligand Efficiency which proved a bit too spicy for two of the J Med Chem reviewers). Put another way a series-specific model with a small number of parameters, may be more useful than model with many parameters that is (apparently) more predictive. I would argue that we’re searching for positive outliers in drug design. It can also be helpful to draw a distinction between prediction-driven design and hypothesis-driven design.]

In contrast, there are a couple of confounding factors that make it more difficult to use ML to predict things like off-target activity. In some (perhaps most) cases, the molecules known to bind to an off-target may look nothing like the molecules you’re working on. This can make it difficult to determine whether your molecules fall within the applicability domain of the model. In addition, the molecules that are active against the off-target may bind to a number of different sites in a number of different ways.

[My suggestion that ML approaches may be better suited for prediction of physical properties and off-target activity was primarily a statement that data is likely to be available for a wider range of chemotypes in these situations than would be the case for primary target. My preferred approach to assessing potential for off-target activity would actually be to search for known actives that were similar (substructural; fingerprint; pharmacophore; shape) to the compounds of interest. Generally, I would be wary of predictions made by a model that had not ‘seen’ anything like the compounds of interest.] 

At the end of the day, ML is one of many techniques that can enable us to make better decisions on drug discovery projects. Like any other computational tool used in drug discovery, it shouldn’t be treated as an oracle. We need to use these tools to augment, rather than replace, our understanding of the SAR.

[Agreed although I believe that ML advocates need be clearer about what ML can do that the older methods can’t do. However, I do not see ML methods augmenting our understanding of SAR because neither the models nor the descriptors can generally be interpreted in structural terms.]