The focus of the conference was Statistics in CADD and the stuff that I took most notice of was the use of Baysian methods although I still need to get my head round things a bit more. Although modelling affinity/potency and output of ADMET assays (e.g. hERG blockade) tends to dominate the thinking of CADD scientists, the links between these properties and clinical outcomes in live humans are not as strong as many assume. Could the wisdom of Rev Bayes be applied to PK/PD modeling and the design of clinical trials? I couldn’t help worrying about closet Bayesians in search of a good posterior and what would be the best way to quantify the oomph of a ROC curve...
Reproducibility is something that needs to be addressed in CADD studies and if we are to improve this we must be prepared to share both data and methods (e.g. software). This open access article should give you an idea of some of the issues and directions in which we need to head. Journal editors have a part to play here and must resist the temptation to publish retrospective analyses of large proprietary data sets because of the numbers of citations that they generate. At the same time, journal editors should not be blamed for supplemental information ending up in PDF format. For example, I had no problems (I just asked) getting JMC (2008), JCIM (2009) and JCAMD (2013 and 2013) to publish supplemental information in text (or zipped text) format.
When you build models from data, it is helpful to think of signal and noise. The noise can be thought of as coming from both the model and from the data and in some cases it may be possible to resolve it into these two components. The function of Statistics is to provide an objective measure of the relative magnitudes of signal and noise but you can’t use Statistics to make noise go away (not that this stops people from trying). Molecular design can be defined as control of behavior of compounds and materials by manipulation of molecular properties and can be thought of as being prediction-driven or hypothesis-driven. Prediction-driven molecular design is about building predictive models but it is worth remembering that a much (most?) pharmaceutical design involves a significant hypothesis-driven component. One way of thinking about hypothesis-driven molecular design is as framework for assembling structure activity/property relationships (SAR/SPR) as efficiently as possible but this is not something that statistical methodology currently appears equipped to do particularly well.
The conference has its own hashtag (#grccadd) and appeared to out-tweet the Sheffield Cheminformatics conference which ran concurrently. Some speakers have shared their talks publically and a package of statistical tools created especially for the conference is available online.
Literature cited and links to talksWP Walters (2013) Modeling, informatics, and the quest for reproducibility. JCIM 53:1529-1530 DOI
CC Chow, Bayesian and MCMC methods for parameter estimation and model comparison. Link
N Baker, The importance of metadata in preserving and reusing scientific information Link
PW Kenny, Tales of correlation inflation. Link
CADD Stat Link