Monday, 20 August 2012

QSAR: Nailed to its perch?

I must confess that I’ve never been a big fan of QSAR. When I started in Pharma 24 years ago, QSAR was seen as as something that would solve all our problems and, over the years, a number of other panaceas would follow in its wake. I find it useful to classify molecular design as either hypothesis-driven or prediction-driven and will discuss this a bit more in a future post. QSAR fits into the prediction-driven category and, to get you thinking a bit about the subject, I'll share a couple of slides from my RACI talk last December.

So EuroQSAR is due to happen again and this time there'll be a session to commemorate QSAR's founding father Corwin Hansch, who died last year.  So will 'Grand Challenges for QSAR' deliver?  Were I going to be there, I'd be checking out Maggiora's talk (Activity Cliffs, Information Theory, and QSAR) since people in the field really need to start thinking more about QSAR in terms of relationships between structures.  Although it's not part of the Hansch session, I'd also be checking out 'The Power of Matched Pairs in Drug Design' by my good friend (and former colleague) Jonas Boström since Matched Molecular Pairs represent one way to recognise and articulate relationships between structures.  And of course I wouldn't miss the Hansch Awardee's talk, the title of which reminded me of an Austrian who struggled, although you won't find that one stocked in the local book shops...

I would like to have seen something on training set design and validation in the 'Grand Challenges' session.  Generally building and validating multivariate models work best when the compounds are distributed evenly in the relevant descriptor space.  Clustering in descriptor space can result in validation giving an optimistic view of model quality and that's one way to end up over-fitting yout data.  Maybe this was one Grand Challenge that the Organising Committee just didn't have the stomach for...

So that's all from me for now.  Why not print out 'QSAR: dead or alive?' (it infuriates those who would seek to lead your opinion) to read on the plane and think up some nasty questions on validation for the experts while waiting in Passkontrolle?

Literature Cited

Doweyko, QSAR: dead or alive? JCAMD 2008, 22, 81-89 DOI


Wavefunction said...

You must have seen Arthur Doweyko's articles about QSAR in which he points out common problems, including correlation vs causation and the fallacious use of q^2. I too remain skeptical of QSAR although it might be useful in limited cases.

Wavefunction said...

Sorry, I spoke too soon; I see that you have referenced Doweyko's article. Another good pice from him is

Pete said...

I see clustering in the training/test spaces as a huge problem. Even if you leave out 9 of a 10 compound cluster, there's still one left to 'anchor' the model and assure you that the model smells of roses. Another one that I like to cite when opining on predictive models is Hawkins' overfitting article:

I think the QSAR could really use one of those Black Swans that Derek mentioned a couple of days ago.

Pete said...

The question that I put to QSAR enthusiasts is, 'If offered the choice between access to the QSAR model and access to the data from which it was derived, which would you take?'

I am also wary of the druglikeness literature. For example, the links between promiscuity and lipophilicity may not be as strong as many assert.

Carlos Montanari said...

Correlation does not imply causation. So, this is the first thing to bear in mind. From stats, one can calculate Pearson´s r for normally-distributed populations while we have to go either for Spearman (nonparametric) correlation coefficient or Kendals's tau for equal scores. See, just for instance:
On the other hand, maybe we have just pushed it to hard and have sometimes forgotten to properly assemble usable data for QSAR analysis where well-designed local models may do better than global models.
QSAR is not dead. QSAR is suffering of crushing misunderstanding where chance correlation and overfitting always play a pivotal role. Moreover, biological responses (BRs) are also ill-played in this game, not to talk about meaningless descriptors that are used to describe BRs.
More, in any QSAR study can you count how many times the mode of action (MOA) has been proved truly real? And even more, how many of them for any QSAR have the same (or at least similar) MOA. If you like, you can include the mode of binding (MOB), too. Here, one can also number lots of overlooked descriptors, well represented by binding entropy.
Overall, this is a matter of recognizing the efforts one has to add into the scenario in order to accomplish with appetite for collaborative joint endeavours to set the scene and try to better describe the chemo-bio space!