Wednesday, 1 July 2015

Into the mess that metrics make

So it looks like the post publication peer review of the article featured in the Expertitis blog post didn't go down too well but as Derek said recently, “…open, post-publication peer review in science is here, and it's here to stay. We'd better get used to it”.   In this post, I’m going to take a look at the webinar that drew me to that article and I hope that participants in that webinar will find the feedback to be both constructive and educational.

Metrics feature prominently in this webinar and the thing that worries me most is the Simplistic Estimate of Enthalpy (SEEnthalpy).   Before getting stuck into the thermodynamics, it’ll be necessary to point out some errors so please go to 20:10 (slide entitled ‘ligand efficiency lessons’) in the webinar.   

The ‘ligand efficiency lessons’ slide concedes that LE has been criticized in print but then incorrectly asserts that the criticism has been rebutted in print.  It is well-known that Mike Schultz has criticized ( 1 | 2 | 3 ) LE as being mathematically invalid and the counter to this is that LE (written as free energy of binding divided by number of non-hydrogen atoms) is a mathematically valid expression (even though the metric itself is thermodynamic nonsense). Nevertheless, Mike still identified the Achilles' heel of LE which is that a linear response of affinity to molecular size has to have zero intercept in order for that linear response to represent constant LE. It is also somewhat ironic that the formula for LE used in the rebuttal to Mike's criticism is itself mathematically invalid because it includes a logarithm of a quantity with units of concentration.  However, another criticism (article and related blog posts 1 | 2 | 3 ) has been made of LE and this has not been rebutted in print.   The ‘ligand efficiency lessons’ slide also asserts that "LE tends to decline with ligand size" and it should also be pointed out that some of the size dependency of LE is an artefact of the arbitrary choice of 1 mole per litre as the standard concentration (article and blog posts 1 | 2 ).   The Ligand Efficiency: Nice Concept, Shame About the Metrics presentation may also be helpful.
This is a good point to introduce SEEnthalpy and, were I writing a satire on drug discovery metrics, I could scarcely do better than this.  Perhaps even ‘Enthalpy – the musical’ with lyrics by Leonard Cohen (Some girls wander by mistake/ Into the mess that metrics make)?  Let’s pick this up at 22:43 in the webinar with the ‘Conventional wisdom – simplistic view’ slide which teaches us that rotatable bonds are hydrophobic interactions (you really did hear that correctly, “entropy comes from non-direct, hydrophobic interactions like rotatable bonds"). I don’t happen to agree with ‘conventional’ or ‘wisdom’ in this context although I do fully endorse ‘simplistic’. Let’s take a closer look at SEEnthalpy which is defined (24:45) from the numbers of hydrogen bond donors (#HBD), hydrogen bond acceptors (#HBA) and rotatable bonds (#RB):

SEEnthalpy = (#HBD + #HBA)/(#HBD + #HBA + #RB)

As I’ve mentioned before, definitions of hydrogen bonding groups can be ambiguous (as can be definitions of rotatable bonds) but I don’t want to get sidetracked by this right now because there are more pressing issues to deal with. You can learn a lot about metrics simply by thinking about them (have a look at comments on LELP in this article) and one question that you might like to ask yourselves is what is #RB (which they’re telling us is associated with entropy) doing in an enthalpy metric?  This may be a good time to note that the contribution of an intermolecular contact (or group of contacts) to the changes in enthalpy or entropy associated with binding is not, in general, an experimental observable.  Could also be a good idea to take a look at A Medicinal Chemist's Guide to Molecular Interactions and Ligand Binding Thermodynamics in Drug Discovery: Still a Hot Tip? 

I have to admit that I found the reasoning behind SEEnthalpy to be unconvincing but, given that the metric appears to have the endorsement of all participants of the webinar, perhaps we should at least give them the chance to demonstrate that it is meaningful. If I was introducing an enthalpy metric then the first thing that I’d do would be to (try to) show that it measured what it was claimed to measure (we call something a metric because it measures something and not because it's easy to remember the formula or because we've thought up a cute acronym). As it turns out, the Binding Database is public and simply oozing with the thermodynamic data that could be used to investigate the relationship between metric and reality.  This makes how they’ve chosen to evaluate the metric seem somewhat bizarre.

The first test of SEEnthalpy was performed using a TB activity dataset.  About 1000 of the compounds in this dataset were found to be active by high throughput screening (HTS) and the remaining 100-ish had come from structure-based drug design (SBDD).   The differences between the two groups of compounds were found to be significant although it is not clear what to make of this.  One interpretation is that the HTS compounds are screening hits (possibly from phenotypic screens) and the SBDD compounds have been optimized.  If this is the case, it will not be too difficult to perceive differences between the two groups of compounds and doing so does not represent one of the more pressing prediction problems in drug discovery.  This is probably a good time to note that correlation does not equal causation.  The other point worth making is that observing that two mean values are significantly different doesn’t tell us about the size of the effect which is more relevant to prediction.  If you want to illustrate the strength of the trend (as opposed to its statistical significance) then you need to show standard deviations with the mean values rather than just standard errors.  If this is unfamiliar territory then why not take a look at this article and make it more familiar.

The next test of SEEnthalpy was to investigate its correlation with biological activity and the chosen activity measure was %inhibition in protein kinase assays.  Typically when we model and analyze activity we use pIC50 or pKi (see this webinar, for example) and it is not at all clear why %inhibition was used instead given the vast amount of public data available in ChEMBL.  One problem with using %inhibition as a measure of activity is that it has a very limited dynamic range and there really is a reason that people waste all that time measuring a concentration response so that they can calculate pIC50 (or pKi).   Let’s pick up the webinar at 27:10 and at 28:13 a plot will appear.   This plot appears to have been generated by ordering compounds by SEEenthalpy and then plotting %inhibition against order.  If you look at the plot you’ll notice that the %inhibition values for most of the compounds are close to zero indicating that they are essentially inactive at the chosen test concentration which means that the correlation between %inhibition will be very weak.  But follow the webinar to 28:32 and you will learn that, “…for this particular kinase, there was a really clear relationship between SEEnthalpy and the activity”.  I’ll leave it to you, the reader, to decide for yourself how clear that trend actually is.

Let’s go to 28:42 in the webinar, at which point the question is posed as to whether SEEnthalpy correlates more strongly than other metrics with activity.   However, when we get to 28:53, we discover that it is actually statistical significance (p values) rather than strength of correlation that is the focus of the analysis.   It is really important to remember that given sufficiently large samples, even the most anemic of trends can acquire eye-wateringly impressive statistical significance and it is the correlation coefficient not the p value that tells us how strong a trend is. Once again, I am forced to steer the participants of the webinar towards this article and this time, in order to reinforce the message, I'll also link an article which illustrates the sort of tangle into which you can get if you confuse the strength of a trend with its statistical significance.

I think this is a good point at which to wrap things up. I'll start by noting that attempting to educate the drug discovery community and shape the thinking of that community's members can backfire if weaknesses in one's grasp of the fundamental science become apparent.  For example confusing the statistical significance of a trend with its strength in a webinar like this may get people asking uncouth questions about the machine learning models that were mentioned (although not discussed) during the webinar.  I'll also mention that the LE values are presented in the webinar without units and I'll finish off by posing the question of whether it is correct,especially for cell-based assays, to convert IC50 to molar energy for the purpose of calculating LE

1 comment:

Mike said...

Isn't the main problem here, among many, that if this "metric" were predictive, it would suggest that to increase enthalpy, you simply add more H-bond donors and acceptors and reduce the rotatable bonds. As Pete points out, rotatable bonds will have nothing to do with enthalpy and H-bonding could increase or decrease enthalpy depending on how the orientation of H-bond groups align with complementary (or otherwise) groups in the receptor. I seem to remember this used to be called a pharmacophore.
If the metric were predictive, inositol (all isomers equally) would always be much more active than hexane, say, regardless of the protein target.