Metrics are like the heads of the Hydra. Dispatch one and two pop up to take its place.
So #RealTimeChem week is over and it's time to return to the topic of metrics and readers of this blog will be aware that this is a recurring theme here. Sometimes, to give them a more 'hard science feel', drug discovery metrics are cast in thermodynamic terms and 'conversion' of IC50 to free energy provides a good example of the problem. Ligand efficiency (LE) was originally defined by scaling free energy of binding by molecular size and it is instructive to observe how toys are ejected from prams when the thermodynamic basis of LE is challenged.
The most important point to note about a metric is that it's supposed to measure something and, regardless of how much you wave your arms and how noisily you assert the metric's usefulness, the metric still needs to measure. That's why we call it a 'metric' and not a 'security blanket for timid medicinal chemists' nor a 'floatation device for self-appointed experts and wannabe thought-leaders'. To be useful, a metric also has to measure something relevant and, in many drug discovery scenarios, that means being predictive of the chemical or biological behavior of compounds. Drug discovery metrics (and guidelines) are often based on trends in data and the strength of the trend tells us how much weight we should give to metrics and how rigidly we should adhere to guidelines. In the metric business, relevance trumps simplicity and even the most anemic of trends can acquire eye-wateringly impressive significance when powered by enough data.
I'll start my review of the article featured in this post by saying that, had the manuscript been sent to me, the response the editor would have been something between 'why have you sent this out for review' and 'this manuscript needs to be put out of its misery as swiftly and mercifully as possible'. The article appears to be the write up for material presented in webinar format which was reviewed less than favorably. The authors have made a few changes and what was previously called SEEnthalpy (Simplistic Estimate of Enthalpy) is now called PEnthalpy (Proxy for Enthalpy) but the fatal design flaws in the original metric remain and the review of that webinar will show what happens when you wander by mistake into the mess that metrics make.
Before we try to cone these thermodynamic proxies in the searchlights, it may be an idea to ask why we should worry about enthalpy or entropy when drug action is driven by affinity and free concentration. That's a good question and, to be quite honest, I really don't know the answer. Isothermal titration calorimetry (ITC) is an excellent, label-free method for measuring affinity and enthalpy of binding. However, the idea that the thermodynamic signature for binding of a compound to a protein will somehow be predictive of the behavior of the compound is all sorts of situations that do not involve that protein does seem to be entering the realms of wild conjecture. There is also the question of how isothermal systems like live humans can 'sense' the benefits of an enthalpically-optimized drug. Needless to say, these are questions that some ITC experts and many aspiring thought-leaders would prefer that you didn't think too hard about.
So let's take a look at the thermodynamic proxies which are defined in terms of the total number (HBT) of hydrogen bond donors and acceptors and the number (RB) of rotatable bonds. The proxies are defined as follows:
PEnthalpy = HBT/(RB + HBT) (1)
PEntropy = RB/(RB + HBT) (2)
PEnthalpy + PEntropy = 1 (3)
Now you may remember in the webinar that one of the authors of the featured article was telling us at 22:43 that "entropy comes from non-direct hydrophobic interactions like rotatable bonds". At least now they seem to realize that the rotatable bonds represent degrees of freedom although I don't get the impression from reading the article that have a particularly solid grasp of the underlying physicochemical principles. Freezing rotatable bonds is an established medicinal chemistry tactic for increasing affinity and, if successful, we expect it to lead to a more favorable entropy of binding which some self-appointed thought-leaders would assert is a bad way to increase affinity. Trying to keep an open mind on this issue, I suggest that we might follow the lead of British Rail and try to define right and wrong types of entropy.
One of the criticisms that I made of the webinar was that no attempt was made to validate the metrics against measured values of binding enthalpy and entropy. In the article, the metrics are evaluated against a small data set of measured values. As I mentioned earlier, there is effectively only one metric because the two metrics are perfectly anti-correlated so you need to look beyond the fit of the data to the metrics if you want to assess what I'll call the 'thermodynamic connection'. This means digging into the supplementary information. I found the following on page 5 of the SI:
-TdeltaS = 159.80522 − 343.46172*Pentropy(RB/(HBT + RB)) (4)
which implies that:
TdeltaS = −159.80522 + 343.46172*Pentropy(RB/(HBT + RB)) (5)
These equations tell us that the change in entropy associated with binding actually increases with RB rather than decreasing with RB as one would expect for degrees of freedom that become frozen when the intermolecular complex forms. When you're assessing proxies for thermodynamic quantities it's a really good idea to take a look at the root mean square error (RMSE) for the fit of the quantity to the proxy. The RMSE values for fitting ΔH and TΔS are 28.35 kJ/mol and 29.30 kJ/mol respectively and I will leave it to you, the reader, to decide for yourself whether or not you consider these RMSE values to justify PEnthalpy and PEntropy being called thermodynamic proxies. The alert reader might ask where the units for ΔH and TΔS° came from since neither the article nor the the SI provides this information and the answer is that you need to go to the source from which the ΔH and TΔS° values were taken to find out.
Now you'll recall that these thermodynamic proxies predict constant values of ΔH and TΔS° for a binding of a given compound to any protein (even those proteins to which it does not bind). The ΔG° values for the compounds in the small data set used to evaluate the thermodynamic proxies lie in a relatively narrow range (i.e. less than the RMSE values mentioned in the previous paragraph) from −37.6 kJ/mol to −57.3 kJ/mol and are not representative of the affinity of these compounds for proteins against which they had not been optimized. Any guesses how the RMSE values for fittling the data would have differed if ΔH and TΔS° values had been used for each compound binding to each of the protein targets?
Now if you've you've kept up to date with the latest developments in the drug discovery metric field, you'll know that even when the mathematical basis of a metric is fragile, there exists the much-exercised option of touting the metric's simplicity and claiming that it is still useful. Provided that nobody calls your bluff, metrics can prove to be a very useful propaganda instruments. The featured article does present examples of data analysis based on the using the thermodynamic proxies as descriptors and one general criticism that I will make of this analysis is that most of it is based on the significance rather the strength of trends. When you tout the significance of a trend, you're saying as much about the size of your data set as you are about the strength of the trend in it. This point is discussed in our correlation inflation article and I'd suggest taking a particularly close look at what we had to say about the analysis in this much-cited article.
I'd like to focus on the analysis presented in the section entitled 'GSK PKIS Dataset' and which explored correlations between protein kinase % inhibition and a number of molecular descriptors. The authors state,
"In addition to PEnthalpy, we assessed the correlation across a variety for physicochemical properties including molecular weight, polar surface area, and logP in addition to PEnthalpy (Fig. 5)"
This statement is actually inaccurate because they have assessed the significance of the correlations rather than the correlations themselves. Although they may have done the assessment for logP and polar surface area, the results of these assessments do not seem have materialized in Fig. 5 and we are left to speculate as to why. The strongest correlation between PEnthalpy and % inhibition was observed for CDK3/cyclinE and the plot is shown in Fig. 5b. I invite you, the reader, to ask yourself whether the correlation shown in Fig. 5b would be useful in a drug discovery project.
Since the title of the post mentions voodoo thermodynamics, we should take a look at this in the context of the article and the best place to look is in the Discussion section. We are actually spoiled for choice when looking for examples of voodoo thermodynamics there but take a look at:
"It is assumed in the literature that the "enthalpically driven compound series" with fewer RBs tend to be (generally) lower MW compounds as well. In contrast, in cases where selectivity is steeper among compounds in a series for which activity and selectivity is likely governed by compounds with relatively more RB versus HBA and HBA [sic], than when the entropic contributors are dominating."
So that's about as much voodoo thermodynamics as I can take for a while so, if it's OK with you, I'll finish by addressing a couple of points to the authors of this article. The flagship product of company with which the authors are associated is a database system for integrating chemical and biological data. Although I'm not that familiar with this database system, responses to my questions during the course of a discussion in the FBDD LinkedIn group suggested that a number of cheminformatic issues have been carefully thought through and that the database system could be very useful in drug discovery. One problem with the featured article is that its scientific weaknesses could lead to some customers losing confidence in the database system. Secondly, the folk who created the database system (and keep it running) may have only limited opportunities to publish and scientifically weak publications by colleagues who are perhaps less focused on what actually pays the bills may breed some resentment.
That's where I'll wrap because there is only so much voodoo thermodynamics that one can take in a day so, as we say in Brazil, 'até mais'.
Now you'll recall that these thermodynamic proxies predict constant values of ΔH and TΔS° for a binding of a given compound to any protein (even those proteins to which it does not bind). The ΔG° values for the compounds in the small data set used to evaluate the thermodynamic proxies lie in a relatively narrow range (i.e. less than the RMSE values mentioned in the previous paragraph) from −37.6 kJ/mol to −57.3 kJ/mol and are not representative of the affinity of these compounds for proteins against which they had not been optimized. Any guesses how the RMSE values for fittling the data would have differed if ΔH and TΔS° values had been used for each compound binding to each of the protein targets?
Now if you've you've kept up to date with the latest developments in the drug discovery metric field, you'll know that even when the mathematical basis of a metric is fragile, there exists the much-exercised option of touting the metric's simplicity and claiming that it is still useful. Provided that nobody calls your bluff, metrics can prove to be a very useful propaganda instruments. The featured article does present examples of data analysis based on the using the thermodynamic proxies as descriptors and one general criticism that I will make of this analysis is that most of it is based on the significance rather the strength of trends. When you tout the significance of a trend, you're saying as much about the size of your data set as you are about the strength of the trend in it. This point is discussed in our correlation inflation article and I'd suggest taking a particularly close look at what we had to say about the analysis in this much-cited article.
I'd like to focus on the analysis presented in the section entitled 'GSK PKIS Dataset' and which explored correlations between protein kinase % inhibition and a number of molecular descriptors. The authors state,
"In addition to PEnthalpy, we assessed the correlation across a variety for physicochemical properties including molecular weight, polar surface area, and logP in addition to PEnthalpy (Fig. 5)"
This statement is actually inaccurate because they have assessed the significance of the correlations rather than the correlations themselves. Although they may have done the assessment for logP and polar surface area, the results of these assessments do not seem have materialized in Fig. 5 and we are left to speculate as to why. The strongest correlation between PEnthalpy and % inhibition was observed for CDK3/cyclinE and the plot is shown in Fig. 5b. I invite you, the reader, to ask yourself whether the correlation shown in Fig. 5b would be useful in a drug discovery project.
Since the title of the post mentions voodoo thermodynamics, we should take a look at this in the context of the article and the best place to look is in the Discussion section. We are actually spoiled for choice when looking for examples of voodoo thermodynamics there but take a look at:
"It is assumed in the literature that the "enthalpically driven compound series" with fewer RBs tend to be (generally) lower MW compounds as well. In contrast, in cases where selectivity is steeper among compounds in a series for which activity and selectivity is likely governed by compounds with relatively more RB versus HBA and HBA [sic], than when the entropic contributors are dominating."
So that's about as much voodoo thermodynamics as I can take for a while so, if it's OK with you, I'll finish by addressing a couple of points to the authors of this article. The flagship product of company with which the authors are associated is a database system for integrating chemical and biological data. Although I'm not that familiar with this database system, responses to my questions during the course of a discussion in the FBDD LinkedIn group suggested that a number of cheminformatic issues have been carefully thought through and that the database system could be very useful in drug discovery. One problem with the featured article is that its scientific weaknesses could lead to some customers losing confidence in the database system. Secondly, the folk who created the database system (and keep it running) may have only limited opportunities to publish and scientifically weak publications by colleagues who are perhaps less focused on what actually pays the bills may breed some resentment.
That's where I'll wrap because there is only so much voodoo thermodynamics that one can take in a day so, as we say in Brazil, 'até mais'.
2 comments:
Pete - I have a slightly different take on some of the metric stuff - and have been trying to set some of it down in writing. This is my first attempt: here
Hi Andrew, I tried to make following comment (except that embedded link was used comment URL) on your blog post but it didn't seem to register. Possibly needs approval but I'll respond here as well just in case.
I really like the term ‘roundheads’ not least because alternatives like ‘compound quality Nazis’ can get people arguing for all the wrong reasons and there is definitely a puritanical aspect to the teachings of these self-appointed arbiters of molecular good taste. In principle, I don’t have a problem with people saying that certain things are verboten provided that the evidence for the badness of those things is presented honestly and clearly. It seems almost to be a point of honour in the rules, guidelines and metrics business to see who can make the weakest trend look the strongest. Problems escalate when ‘experts’ start to believe their own BS and lose (I'm being charitable using the term 'lose') the ability to distinguish opinion from fact. Your ‘forbidden substructures’ theme is most apposite and it can be instructive to look at some of the PAINS literature (see this link). There are certainly some evil-looking PAINS but it can be very difficult to establish whether there is any solid evidence that some of the compounds are inherently bad.
Post a Comment