I’ve just returned to Trinidad where I’ve been spending the
summer. I was in the USA at the
Computer-Aided Drug Design (CADD) Gordon Conference (GRC) organized by Anthony Nicholls and Martin Stahl. The first thing that I should
say is that this will not be a conference report because what goes on at all
Gordon Conferences is confidential and off-record. This is intended to make discussions
freer and less inhibited and you won’t see proceedings of GRCs published. Nevertheless, the conference program is
available online so I think that it’ll be OK to share some general views of the
CADD field (which have been influenced by what I picked up at the conference) even
though commenting on specific lectures, posters or discussions would be
verboten.
The focus of the conference was Statistics in CADD and the
stuff that I took most notice of was the use of Baysian methods although I
still need to get my head round things a bit more. Although modelling
affinity/potency and output of ADMET assays (e.g. hERG blockade) tends to
dominate the thinking of CADD scientists, the links between these properties
and clinical outcomes in live humans are not as strong as many assume. Could the wisdom of Rev Bayes be applied to
PK/PD modeling and the design of clinical trials? I couldn’t help worrying about closet Bayesians
in search of a good posterior and what would be the best way to quantify the
oomph of a ROC curve...
Reproducibility is something that needs to be addressed in
CADD studies and if we are to improve this we must be prepared to share both
data and methods (e.g. software). This open access article should give you an idea of some of the issues and directions in
which we need to head. Journal editors have a part to play here and must resist
the temptation to publish retrospective analyses of large proprietary data sets
because of the numbers of citations that they generate. At the same time, journal editors should not
be blamed for supplemental information ending up in PDF format. For example, I had no problems (I just asked)
getting JMC (2008), JCIM (2009) and JCAMD (2013 and 2013) to publish
supplemental information in text (or zipped text) format.
When you build models from data, it is helpful to think of
signal and noise. The noise can be
thought of as coming from both the model and from the data and in some cases it
may be possible to resolve it into these two components. The function of Statistics is to provide an
objective measure of the relative magnitudes of signal and noise but you can’t
use Statistics to make noise go away (not that this stops people from trying). Molecular design can be defined as control of
behavior of compounds and materials by manipulation of molecular properties and can
be thought of as being prediction-driven or hypothesis-driven. Prediction-driven molecular design is about
building predictive models but it is worth remembering that a much (most?)
pharmaceutical design involves a significant hypothesis-driven component. One
way of thinking about hypothesis-driven molecular design is as framework for
assembling structure activity/property relationships (SAR/SPR) as efficiently
as possible but this is not something that statistical methodology currently
appears equipped to do particularly well.
The conference has its own hashtag (#grccadd) and appeared
to out-tweet the Sheffield Cheminformatics conference which ran
concurrently. Some speakers have shared their
talks publically and a package of statistical tools created especially for the
conference is available online.
Literature cited and links to talks
WP Walters (2013) Modeling, informatics, and the quest for reproducibility.
JCIM 53:1529-1530 DOI
CC Chow, Bayesian and MCMC methods for parameter estimation
and model comparison. Link
N Baker, The importance of metadata in preserving and
reusing scientific information Link
PW Kenny, Tales of correlation inflation. Link
CADD Stat Link