Comments on Molecular Design: Saving Ligand Efficiency?

Part 2 (in response to Dan) “This leads to my nex...

2014-09-18T04:55:59.204+01:00

Part 2 (in response to Dan)

“This leads to my next point, which is that researchers (perhaps especially computational chemists!) often treat experimental data with undeserved deference. As evidence, one could argue that the two lines in your figure are actually quite similar, and that the data available in many projects would not allow one to distinguish between the two. This is especially true where initial fragments may have low affinities and high error bars. Both lines show the same overall trend, and in many cases you don't need much greater accuracy than this.”

Although always wary of experimental data, I am even more wary of predictions. I tend to worry more about the systematic error (e.g. assay interference; concentration of DMSO stock solutions) than the random error which is easier to get a handle on. I wouldn’t worry too much about the specific lines that I drew because I can construct other examples for which the differences will be larger. I’d constructed the graphic as a sort of reductio ad absurdum exercise and the point was more about the implications of the lines crossing. However, please let me know if you’d like to discuss this point in more detail.

“This leads to my final point, which is that metrics such as ligand efficiency are typically most heavily used at the earliest stages of a program, before one has acquired enough data to use more sophisticated analyses (such as residuals). If you are trying to rapidly triage several hundred fragment hits on a new project, is ligand efficiency really such a bad place to start?”

Some would argue that LEMs are (or should be) used at all stages of projects although don’t think we need to get into that particular debate that to address your point about triaging hits. Using residuals is not sophisticated analysis. Software like JMP (and I think SpotFire as well) can update the data table with the residuals as soon as we fit activity to risk factor. The key point is that we are using the trend actually observed in the data to normalize it when we use residuals although we could also use the intercept to re-define LE as described in the blog post. If I were triaging hits, I would still attempt to fit a straight line to the plot of activity against molecular size one of the key messages in the LEM critique was to model the data with open minds rather than distorting the analysis by making arbitrary assumptions. In the triage situation we often find that ranges in activity and/or molecular size are limited and, with uncertainty in activity, this limits the strength of correlations that we can observe in the data (and by implication the precision in our estimate of the intercept). However, in triaging hits we are doing more than just trying to score them. For example, we might try to identify structural series or perform neighborhood analysis.

Part 1 “As always an entertaining and thorough an...

2014-09-18T04:54:36.279+01:00

Part 1

“As always an entertaining and thorough analysis. However, I would like to raise a few questions/challenges.”

Thanks, Dan, drug discovery is really tough but I think that we can be serious without being solemn. I’ve tried to address your questions although we’ll probably need to go through a couple iterations to nail this properly so please let me know what needs expansion or clarification.

“First, as you have frequently suggested using residuals, it might be instructive for you to present some examples of how to do this using actual (as opposed to model) data. Does using residuals actually yield superior results in the real world?”

My first response to this would be to counter-challenge and ask if we have a way to objectively show that one set of results is superior to another when using real world data. However, I think we need to go right back to the introduction of LEMs to give your challenge the attention that it deserves. The term ‘normalization’ was used right from the start but nobody actually bothered to say what they meant by term. What I think they mean is that they are trying to account for the ‘contribution’ that a risk factor makes to activity. If this is what you want to do then (I have argued that) you need to model the data. LEMs are often interpreted as ‘bang-for-buck’ and to quantify this we need to have an objective mechanism for setting price. Right now I have little or no real world data at my disposal but at the same time I have no idea how I would compare performance of LEMs and residuals in an objective manner even if I did. If anybody does have ideas about how to do this, I’d be pleased to hear about them. I do have an example that I pulled from the Astex group efficiency article in one of my slide share presentations ( http://www.slideshare.net/pwkenny/data-analytic ) and it is possible that this will be of interest )

As always an entertaining and thorough analysis. H...

2014-09-16T04:25:41.575+01:00

As always an entertaining and thorough analysis. However, I would like to raise a few questions/challenges.

First, as you have frequently suggested using residuals, it might be instructive for you to present some examples of how to do this using actual (as opposed to model) data. Does using residuals actually yield superior results in the real world?

This leads to my next point, which is that researchers (perhaps especially computational chemists!) often treat experimental data with undeserved deference. As evidence, one could argue that the two lines in your figure are actually quite similar, and that the data available in many projects would not allow one to distinguish between the two. This is especially true where initial fragments may have low affinities and high error bars. Both lines show the same overall trend, and in many cases you don't need much greater accuracy than this.

This leads to my final point, which is that metrics such as ligand efficiency are typically most heavily used at the earliest stages of a program, before one has acquired enough data to use more sophisticated analyses (such as residuals). If you are trying to rapidly triage several hundred fragment hits on a new project, is ligand efficiency really such a bad place to start?