Thursday, 27 August 2015

On open access

I’ve been meaning to write something about open access (OA) for some time and I do genuinely believe that true OA is a worthwhile (and achievable) ideal. This blog post is not intended as a critique of OA although I will suggest that its advocates do need to think a bit more carefully about how they present the benefits. I'll be writing from the perspective of authors of scientific journal articles and disclose, in the interests of openness, that all but one of my publications have been in paywall journals. One general observation that I will make is that the sanctimonious, holier-than-thou attitude of some OA advocates (who may occasionally avail themselves of the advantages of publishing in paywall journals) does tend to grate at times.  Although there is actually a lot more to OA than who pays for journal articles to be published, I'll not be discussing those issues in this post. The principal theme of this post is that it may be helpful for those who opine loudly on the benefits of OA to take a more author-centric look at the relevant issues and this is especially important for those advocates whose contributions to the scientific literature have been modest.  

One of my gripes is that OA is sometimes equated with open science and I believe that the openness of science is actually a much bigger issue than who pays for scientific articles to be published in journals. Drug discovery research is my primary focus and a certain amount of what (we think that) we know in this field comes from analysis of proprietary databases.  Some of this analysis falls short of what one might feel entitled to expect from journals that would describe themselves as premier journals and you’ll find some examples in this presentation.  I have argued here and here that studies of this genre would actually pack a much heavier punch if the data was made publicly available. However, it is not only datasets that are undisclosed in drug discovery papers.  In some cases, models built from the data are not disclosed either (this does not inhibit authors from comparing their models favorably with previously published models). If the science in a data modelling study is to be accurately described as open then the data and models associated with the study have to be disclosed. This is not just an issue for paywall journals and the models described in the following OA articles do not appear to be fully disclosed:

The point that I'm making here is that a the science in a paywall journal is not necessarily less open than that in an OA journal. In my view, the openness of the science presented in a journal article is determined by how much its authors disclose rather than whether or not one needs to pay to read the article. It's also worth remembering that internet access is not free.

This is a good point to say say something about the economics of publishing articles in scientific journals. In economic terms, a journal article represents a service that the journal has provided for the author rather than the other way round (which many would regard as more sensible).  The author typically pays for the service by either signing over rights to paywall journal or by paying an article processing charge (APC) to a journal (OA or paywall) which means that the article can be freely viewed online and rights are (hopefully) retained by the author.  I need to note that not all OA journals levy APCs and, in these cases, it's a good idea to check how the journal is funded and whether or not articles are peer-reviewed. Payment of an APC implies that the journal article is technically an advertisement and this has interesting (and potentially even troubling) implications if the journal also publishes conventional advertisements. All articles in a paywall journal are usually considered (e.g. by download cost) to be of equal value and this is one of the bigger fictions in the scholarly publishing business.  The situation becomes more complicated with OA journals because the APC may vary with geographical region or be waived completely for some individuals. In the interest of openness and accountability, shouldn't open access journals disclose the actual APC paid for each article published?

There are compelling arguments that can be made in support of OA and, like many, I believe that the results of publicly-funded research should be fully visible to the public. This is probably a good point at which to raise the question of who should own intellectual property generated by publicly-funded research. OA broadens participation (by citizen scientists and emerging scientific communities in developing countries) and increases the likelihood of a study being read (and used) by the people with the most interest in it. OA greatly facilitates mining of the ever-expanding body of published literature which adds to its value and, unsurprisingly, some of the loudest voices calling for scientific literature to be made public property are those of informaticians and data scientists.  One question that OA advocates need to address is whether the results of mining and analysis of freely available literature should also be considered to be public property. While a strong and coherent case can be made for OA, it might not be such a great idea to invoke morality (can get a bit subjective) or fundamental human rights (people under siege in Syria might consider it less than tasteful that their suffering is being equated with their inability to access the latest articles in Nature).

There are a few hurdles that will need to be overcome if OA vision is to translate to OA reality. Advocates for OA typically demonize the paywall journals although the cynic might note that easily-recognized enemies are needed by the leaders of mass movements in order to galvanize the proletariat. Some declare that they will only review manuscripts for open access journals while others assert that they are boycotting Nature and Science (I scarcely dare to think of the mirth of that would result from me announcing to colleagues that I will no longer be submitting manuscripts to Nature). While demonizing paywall journals may make people feel good (and righteous), it misses the essential point that the OA movement needs to win hearts and minds so that authors (the creators of content) come to see OA journals as the most attractive option for publishing their studies.  Put more bluntly, authors are as at least as much to 'blame' as the journals for their articles 'disappearing' behind paywalls. It's also worth remembering that journal editors are also usually authors as are the people on academic hiring committees who might use journal impact factor to to decide that one research output of one applicant is worth more than that of another applicant.

One nettle that the OA movement does need to grasp is that journal publishing is frequently a business and, from the perspective of the 'customer', the terms 'paywall' and 'open access' simply describe alternative business models.  I'm certainly not saying that it has to be that way and there are other areas, such as the provision of health care, education and religion, where the case could be made for the exclusion of commercial activity.  However, the issue remains and, like or not, the OA movement is effectively lobbying in favor of one set of commercial interests over another set of commercial interests. This means that OA advocates need to be clear (and open) about the sources of any funding that should come their way.  The other nettle that the OA movement needs to grasp is that it should be very careful about whom it allows to speak on its behalf.  OA propaganda sometimes takes the form 'open access good, paywall bad' and the message loses some of its bite if it turns out that Napoleon has been publishing in Nature while denouncing Beall's List as reactionary and counter-revolutionary.

So let's take a look at OA from the perspective of a scientist who is trying to decide where to send a manuscript.  Typically the decision will be made by the principal investigator (PI) who will be well aware that funding and, if relevant, tenure decisions will be influenced by which journals in which the PI's articles have been published.  In some areas (e.g. biomedical), academic research is particularly competitive and PIs are well aware that an article published in an elite journal is visible to the PI's peers and all those who will be making decisions about the PI's future.  As such, publishing in an OA journal can represent a risk without compensating benefits (in competitive academic science, knowing that everybody on the planet can see your article fits into the 'nice to have' category). Another factor that inhibits authors from publishing in OA journals is the paucity of options in some disciplines (e.g. chemistry). I don't believe that there is currently a credible OA equivalent to Journal of Medicinal Chemistry and would be concerned about the ability of a general purpose OA journal to identify and recruit specialist reviewers. I'm not a computational biologist but, if I were, I'd definitlely be thinking twice about submitting an article to a computational biology journal that published Ten Simple Rules of Live Tweeting at Scientific ConferencesFinally there is the issue of cost because publishing in OA journals (or paying for OA in a paywall journal) does not usually come cheap and, in some cases, funding needs to be specifically sourced (as opposed to simply paying out of one's grant). Those scientists for whom the APC represents a significant expense will also be aware that they are effectively subsidizing other scientists for whom the APC has been waived. Something that may be going through the mind of more than one scientist is the direction in which APCs might head should all scientific publishing be forced to adopt an OA business model.

I think this is a good point at which to wrap things up and I should thank you for staying with me.  If you're an OA opponent and had been hoping to see OA skewered then please accept my apologies for having disappointed you. If you're an OA supporter, you may well disagree with many (or all) of the points that I've raised. If, however, something in this post has prompted you to think about things from a different angle then it will have achieved its purpose.

Sunday, 12 July 2015

Molecular recognition, controlled experiments and discrete chemical space

A recent Curious Wavefunction post (The fundamental philosophical dilemma of chemistry) got me thinking a bit.  Ash notes,

“For me the greatest philosophical dilemma in chemistry is the following: It is the near impossibility of doing controlled experiments on the molecular level. Other fields also suffer from this problem, but I am constantly struck by how directly one encounters it in chemistry”.  

I have a subtly different take on this although I do think that Ash is most definitely pointing the searchlight in the right direction and his post was discussed recently by members of the LinkedIn Computational Chemistry Group.

Ash illustrates the problem from a perspective that will be extremely familiar to anybody with experience in pharmaceutical molecular design,

“Let’s say I am interested in knowing how important a particular hydrogen bond with an amide in the small molecule is. What I could do would be to replace the amide with a non hydrogen-bonding group and then look at the affinity, either computationally or experimentally. Unfortunately this change also impacts other properties of the molecules; its molecular weight, its hydrophobicity, its steric interactions with other molecules. Thus, changing a hydrogen bonding interaction also changes other interactions, so how can we then be sure that any change in the binding affinity came only from the loss of the hydrogen bond? The matter gets worse when we realize that we can’t even do this experimentally”

The “gets worse” acknowledges that the contribution of an interaction to affinity is not an experimental observable.  As we noted in a recent critique of ligand efficiency metrics,

“In general, the contribution of a particular intermolecular contact (or group of contacts) to affinity (or the changes in enthalpy, entropy, heat capacity or volume associated with binding) cannot be measured experimentally”

It’s easy to see why this should be the case for interaction of a drug with its target because this happens in an aqueous environment and the binding event is coupled to changes in solvent structure that are non-local with respect to the protein-ligand interactions.  As former colleagues at AZ observe in their critique of enthalpy optimization,

“Both ΔH° and ΔS° are typically sums over an enormous amount of degrees of freedom, many contributions are opposing each other and comparable in amplitude, and in the end sums up to a comparably smaller number, ΔG°”

Water certainly complicates interpretation of binding thermodynamics in terms of structure but, even in gas phase, the contributions of intermolecular contacts are still no more observable experimentally than atomic charges. 

Let’s think a bit about what would constitute a controlled experiment in the context of generating a more quantitative understanding of the importance of individual interactions.  Suppose that it’s possible to turn a hydrogen bond acceptor into (apolar) hydrocarbon without changing the remainder of the molecule. We could measure the difference in affinity/potency pKd/pIC50 between the compound of interest and its analog in which the hydrogen bond acceptor has been transformed into hydrocarbon. The Figure below shows how this can be done and the pIC50 values (taken from this article) can be interpreted as evidence that hydrogen bonds to one of the pyridazine nitrogen are more beneficial for potency against cathepsins S and L2 than against cathepsin L.

Although analysis like this can provide useful insight for design, it still doesn’t tell us what absolute contributions these hydrogen bonds make to potency because our perception depends on our choice of reference. Things get more complicated if we are trying to assess the importance of the importance of the hydrogen bond donor and acceptor characteristics of a secondary amide in determining activity. We might replace the carbonyl oxygen atom with a methylene group to assess the importance of hydrogen bond acceptor ability.. er... maybe not.  The amide NH could be replaced with a methylene group but this will reduce the hydrogen bond acceptor strength of the carbonyl oxygen atom as well as changing the torsional preferences of the amidic bond.   This illustrates the difficulty, highlighted by Ash, of performing controlled experiments when trying to dissect the influences of different molecular properties on the behavior of compounds.

The above examples raise another issue that rarely, if ever, gets discussed.  Although chemical space is vast, it is still discrete at the molecular level and that may prove to be an even bigger dilemma than not being able perform controlled experiments at the molecular level.  As former colleagues and I suggested in our FBDD screening library article, fragment-based approaches may enable chemical space to be sampled at a more controllable resolution.  Could it be that fragments may have advantages in dealing with the discrete nature of chemical space? 

Wednesday, 1 July 2015

Into the mess that metrics make

So it looks like the post publication peer review of the article featured in the Expertitis blog post didn't go down too well but as Derek said recently, “…open, post-publication peer review in science is here, and it's here to stay. We'd better get used to it”.   In this post, I’m going to take a look at the webinar that drew me to that article and I hope that participants in that webinar will find the feedback to be both constructive and educational.

Metrics feature prominently in this webinar and the thing that worries me most is the Simplistic Estimate of Enthalpy (SEEnthalpy).   Before getting stuck into the thermodynamics, it’ll be necessary to point out some errors so please go to 20:10 (slide entitled ‘ligand efficiency lessons’) in the webinar.   

The ‘ligand efficiency lessons’ slide concedes that LE has been criticized in print but then incorrectly asserts that the criticism has been rebutted in print.  It is well-known that Mike Schultz has criticized ( 1 | 2 | 3 ) LE as being mathematically invalid and the counter to this is that LE (written as free energy of binding divided by number of non-hydrogen atoms) is a mathematically valid expression (even though the metric itself is thermodynamic nonsense). Nevertheless, Mike still identified the Achilles' heel of LE which is that a linear response of affinity to molecular size has to have zero intercept in order for that linear response to represent constant LE. It is also somewhat ironic that the formula for LE used in the rebuttal to Mike's criticism is itself mathematically invalid because it includes a logarithm of a quantity with units of concentration.  However, another criticism (article and related blog posts 1 | 2 | 3 ) has been made of LE and this has not been rebutted in print.   The ‘ligand efficiency lessons’ slide also asserts that "LE tends to decline with ligand size" and it should also be pointed out that some of the size dependency of LE is an artefact of the arbitrary choice of 1 mole per litre as the standard concentration (article and blog posts 1 | 2 ).   The Ligand Efficiency: Nice Concept, Shame About the Metrics presentation may also be helpful.
This is a good point to introduce SEEnthalpy and, were I writing a satire on drug discovery metrics, I could scarcely do better than this.  Perhaps even ‘Enthalpy – the musical’ with lyrics by Leonard Cohen (Some girls wander by mistake/ Into the mess that metrics make)?  Let’s pick this up at 22:43 in the webinar with the ‘Conventional wisdom – simplistic view’ slide which teaches us that rotatable bonds are hydrophobic interactions (you really did hear that correctly, “entropy comes from non-direct, hydrophobic interactions like rotatable bonds"). I don’t happen to agree with ‘conventional’ or ‘wisdom’ in this context although I do fully endorse ‘simplistic’. Let’s take a closer look at SEEnthalpy which is defined (24:45) from the numbers of hydrogen bond donors (#HBD), hydrogen bond acceptors (#HBA) and rotatable bonds (#RB):

SEEnthalpy = (#HBD + #HBA)/(#HBD + #HBA + #RB)

As I’ve mentioned before, definitions of hydrogen bonding groups can be ambiguous (as can be definitions of rotatable bonds) but I don’t want to get sidetracked by this right now because there are more pressing issues to deal with. You can learn a lot about metrics simply by thinking about them (have a look at comments on LELP in this article) and one question that you might like to ask yourselves is what is #RB (which they’re telling us is associated with entropy) doing in an enthalpy metric?  This may be a good time to note that the contribution of an intermolecular contact (or group of contacts) to the changes in enthalpy or entropy associated with binding is not, in general, an experimental observable.  Could also be a good idea to take a look at A Medicinal Chemist's Guide to Molecular Interactions and Ligand Binding Thermodynamics in Drug Discovery: Still a Hot Tip? 

I have to admit that I found the reasoning behind SEEnthalpy to be unconvincing but, given that the metric appears to have the endorsement of all participants of the webinar, perhaps we should at least give them the chance to demonstrate that it is meaningful. If I was introducing an enthalpy metric then the first thing that I’d do would be to (try to) show that it measured what it was claimed to measure (we call something a metric because it measures something and not because it's easy to remember the formula or because we've thought up a cute acronym). As it turns out, the Binding Database is public and simply oozing with the thermodynamic data that could be used to investigate the relationship between metric and reality.  This makes how they’ve chosen to evaluate the metric seem somewhat bizarre.

The first test of SEEnthalpy was performed using a TB activity dataset.  About 1000 of the compounds in this dataset were found to be active by high throughput screening (HTS) and the remaining 100-ish had come from structure-based drug design (SBDD).   The differences between the two groups of compounds were found to be significant although it is not clear what to make of this.  One interpretation is that the HTS compounds are screening hits (possibly from phenotypic screens) and the SBDD compounds have been optimized.  If this is the case, it will not be too difficult to perceive differences between the two groups of compounds and doing so does not represent one of the more pressing prediction problems in drug discovery.  This is probably a good time to note that correlation does not equal causation.  The other point worth making is that observing that two mean values are significantly different doesn’t tell us about the size of the effect which is more relevant to prediction.  If you want to illustrate the strength of the trend (as opposed to its statistical significance) then you need to show standard deviations with the mean values rather than just standard errors.  If this is unfamiliar territory then why not take a look at this article and make it more familiar.

The next test of SEEnthalpy was to investigate its correlation with biological activity and the chosen activity measure was %inhibition in protein kinase assays.  Typically when we model and analyze activity we use pIC50 or pKi (see this webinar, for example) and it is not at all clear why %inhibition was used instead given the vast amount of public data available in ChEMBL.  One problem with using %inhibition as a measure of activity is that it has a very limited dynamic range and there really is a reason that people waste all that time measuring a concentration response so that they can calculate pIC50 (or pKi).   Let’s pick up the webinar at 27:10 and at 28:13 a plot will appear.   This plot appears to have been generated by ordering compounds by SEEenthalpy and then plotting %inhibition against order.  If you look at the plot you’ll notice that the %inhibition values for most of the compounds are close to zero indicating that they are essentially inactive at the chosen test concentration which means that the correlation between %inhibition will be very weak.  But follow the webinar to 28:32 and you will learn that, “…for this particular kinase, there was a really clear relationship between SEEnthalpy and the activity”.  I’ll leave it to you, the reader, to decide for yourself how clear that trend actually is.

Let’s go to 28:42 in the webinar, at which point the question is posed as to whether SEEnthalpy correlates more strongly than other metrics with activity.   However, when we get to 28:53, we discover that it is actually statistical significance (p values) rather than strength of correlation that is the focus of the analysis.   It is really important to remember that given sufficiently large samples, even the most anemic of trends can acquire eye-wateringly impressive statistical significance and it is the correlation coefficient not the p value that tells us how strong a trend is. Once again, I am forced to steer the participants of the webinar towards this article and this time, in order to reinforce the message, I'll also link an article which illustrates the sort of tangle into which you can get if you confuse the strength of a trend with its statistical significance.

I think this is a good point at which to wrap things up. I'll start by noting that attempting to educate the drug discovery community and shape the thinking of that community's members can backfire if weaknesses in one's grasp of the fundamental science become apparent.  For example confusing the statistical significance of a trend with its strength in a webinar like this may get people asking uncouth questions about the machine learning models that were mentioned (although not discussed) during the webinar.  I'll also mention that the LE values are presented in the webinar without units and I'll finish off by posing the question of whether it is correct,especially for cell-based assays, to convert IC50 to molar energy for the purpose of calculating LE

Monday, 13 April 2015


I will never call myself an expert for to do so would be to take the first step down a very slippery slope.  It is important to remember that each and every expert has both an applicability domain and a shelf life.  It’s not the experts themselves to which I object but what I’ll call ‘expert-driven decision-making’ and the idea that you can simply delegate your thinking to somebody else.

I’m going to take a look at an article that describes attempts to model an expert’s evaluation of chemical probes identified by NIH-funded high throughput screening.  The study covers some old ground and one issue facing authors of articles like these is how to deal with the experts in what might be termed a ‘materials and methods context’.   Experts who are also coauthors of studies like these become, to some extent, self-certified experts.   One observation that can be made about this article is that its authors appear somewhat preoccupied with the funding levels for the NIH chemical probe initiative and I couldn't help wondering about how this might have shaped or influenced the analysis.

A number of approaches to modeling the expert’s assessment of the 322 probe compounds of which 79% were considered to be desirable.   The approaches used by the authors ranged from simple molecular property filtering to more sophisticated machine learning models.  I noted two errors (missing molar energy units; taking logarithm of quantity with units) in the formula for ligand efficiency and it’s a shame they didn’t see our article on ligand efficiency metrics which became available online about six weeks before they submitted their article (the three post series starting here may be helpful).  The authors state, “PAINS is a set of filters determined by identifying compounds that were frequent hitters in numerous high throughput screens” which is pushing things a bit because the PAINS filters were actually derived from analysis of the output from six high throughput screening campaigns (this is discussed in detail in three-post series that starts here).  Mean pKa values of 2.25 (undesirable compounds) and 3.75 (desirable compounds) were reported for basic compounds and it certainly wasn’t clear to me how compounds were deemed to be basic given that these values are well below neutral pH. In general, one needs to be very careful when averaging pKa values.  While these observations might be seen as nit-picking, using terms like ‘expert’, ‘validation’ and ‘due diligence’ in title and abstract does set the bar high.

A number of machine learning models were described and compared in the article and it’s worth saying something about models like these.  A machine learning model is usually the result of an optimization process.  When we build a machine learning model, we search for a set of parameters that optimizes an objective such as fit or discrimination for the data with which we train the model.  The parameters may be simple coefficients for parameters (like in a regression model) but they might also be threshold values for rules.  The more parameters you use to build a model (machine learning or otherwise), the more highly optimized the resulting model will be and we use the term ‘degrees of freedom’ to say how many parameters we’ve used when training the model.   You have to be very careful when comparing models that have different numbers of degrees of freedom associated with them and one criticism that I would make of machine learning models is that the number of degrees of freedom is rarely (if ever) given.  Over-fitting is always a concern with models and it is customary to validate machine learning models using one or more of a variety of protocols. Once a machine learning model has been validated, number of degrees of freedom is typically considered to be a non-issue. Clustering in data can cause validation to make optimistic assessments of model quality and the predictive chemistry community does need to pay more attention to Design of Experiments. Here’s a slide that I sometimes use in molecular design talks.

Let’s get back to the machine learning models in the featured article.  Comparisons were made between models (see Figure 4 and Table 5 in the article) but no mention is made of numbers of degrees of freedom for the models.   I took a look in the supplementary information to see if I could get this information by looking at the models themselves and discovered that the models had not actually been reported. In fact, the expert’s assessment of the probes had not been reported either and I don't believe that this article scores highly for either reproducibility or openness.   Had this come to me as a manuscript reviewer, the response would have been swift and decisive and you probably wouldn’t be reading this blog post. How much weight should those responsible for NIH chemical probes initiative give to the study?  I’d say they can safely ignore it because the data set is proprietary and models trained on it are only described and not actually specified.  Had the expert's opinion on the desirability (or otherwise) of the probes been disclosed then it would have been imprudent for the NIH folk to ignore what the expert had to say. At the same time, it's worth remembering that we seek different things from probes and from drugs and one expert's generic opinion of a probe needs to placed in the context of any specific design associated with the probe's selection.

However, there are other reasons that the NIH chemical probes folk might want to be wary of data analysis from this source and I'll say a bit more about these in the next post.


Wednesday, 1 April 2015

Ligand efficiency validated

So I have to admit that I got it wrong and it looks like Ligand Efficiency (LE) is actually thermodynamically valid after all. You’ll recall my earlier objections to LE on the grounds that the standard concentration of 1 M has been arbitrarily chosen and that our perception of compound quality depends on the units of concentration in which the relevant equilibrium constants are defined. A Russian friend recently alerted me to an article in the Siberian Journal of Thermodynamics in which she thought I might be interested and was kind enough to scan it for me because this journal is not available in the West due to the current frosty relations with Russia.   This seminal study by RR Raskolnikov demonstrates unequivocally that the 1 M standard concentration is absolutely and uniquely correct thus countering all known (and even some unknown) objections to LE as a metric of compound quality.  The math is quite formidable and the central proof is based on the convergence characteristics of the trace of the partial molar polarizability tensor.  Enigmatically, the author acknowledges the assistance of an individual named only as Porfiry Petrovich in helping him to find the truth after a long search.   Raskolnikov’s career path appears to have been rather unorthodox.  Originally from St Petersburg, he moved east because of some trouble with a student loan and for many years he was dependent on what his wife was able to earn.    

Sunday, 22 March 2015

Literature pollution for dummies

<< previous |  

So I thought that I’d conclude this mini-series ( 1 | 2 ) of PAINS posts with some lighter fare, the style of which is intended to be a bit closer to that of a PAINS-shaming post ( 1 | 2 ) than is normal for posts here.  As observed previously, the PAINS-shaming posts are vapid and formulaic although I have to admit that it’s always a giggle when people spit feathers on topics outside their applicability domains.   I must also concede that one of the PAINS-shaming posts was even cited in a recent article in ACS Medicinal Chemistry Letters although this citation might be regarded as more confirmation of the old adage that, ‘flattery will get you anywhere’ than indication of the scientific quality of the post.  I shouldn’t knock that post too much because it’s what goaded me into taking a more forensic look at the original PAINS article.  However, don’t worry if you get PAINS-shamed because, with acknowledgement to Denis Healey, being PAINS-shamed is “like being savaged by a dead sheep”

I should say something about the graphic with which I’ve illustrated this blog post.  It shows a diving Junkers Ju 87 Stuka and I’ll let aviation author William Green, writing in ‘Warplanes of the Third Reich’, tell you more about this iconic aircraft:

“The Ju 87 was an evil-looking machine, with something of the predatory bird in its ugly contours – its radiator bath and fixed, spatted undercarriage resembling gaping jaws and extended talons – and the psychological effect on the recipients of its attentions appeared almost as devastating as the bombs that it delivered with such accuracy.  It was an extremely sturdy warplane, with light controls, pleasant flying characteristics and a relatively high standard of manoeuvrability. It offered crew members good visibility and it was able to hit a target in a diving attack with an accuracy of less than 30 yards. All these were highly desirable characteristics but they tended to blind Ob.d.L. to the Ju 87’s shortcomings. Its use presupposed control of the air, for it was one of the most vulnerable of combat aircraft and the natural prey of the fighter…”

I really should get back on-topic because I doubt that Rudel ever had to worry about singlet oxygen while hunting T-34s on TheEastern Front.  I’ve promised to show you how to get away with polluting the literature so let’s suppose you’ve submitted a manuscript featuring PAINful structures and the reviewers have said, “Nein, es ist verboten”.  What should you do next? The quick answer is, “It depends”.  If the reviewers don't mention the orginal PAINS article and simply say that you’ve just not done enough experimental work to back up your claims then, basically, you’re screwed. This is probably a good time get your ego to take a cold shower and to find an unfussy open access journal that will dispense with the tiresome ritual of peer review and quite possibly include a package of downloads and citations for no additional APC.

Let’s look at another scenario, one in which the reviewers have stated that the manuscript is unacceptable simply because the compounds match substructures described in the original PAINS article.  This is the easiest situation to deal with although if you’re using AlphaScreen to study a protein-protein interaction you should probably consider the open access suggestion outlined above.  If not using AlphaScreen, you can launch your blitzkrieg although try not make a reviewer look like a complete idiot because the editor might replace him/her with another one who is a bit more alert.  You need to point out to the editor that the applicability domain (using this term will give your response a degree of authority) for the original PAINS filters is AlphaScreen used to assay protein-protein interactions and therefore the original PAINS filters are completely irrelevant to your submission.  You might also play the singlet oxygen card if you can find evidence (here’s a useful source for this type of information) for quenching/scavenging behavior by compounds that have aggrieved the reviewers on account of matching PAINS filters.

Now you might get a more diligent reviewer who looks beyond the published PAINS filters and digs up some real dirt on compounds that share a substructure with the compounds that you’ve used in your study and, when this happens, you need to put as much chemical space as you can between the two sets of compounds.  Let’s use Voss et al (I think that this was what one of the PAINS-shaming posts was trying to refer to) to illustrate the approach.  Voss et al describe some rhodanine-based TNF-alpha antagonists, the ‘activity’ of which turned out to be light-dependent and I would certainly regard this sort of light-dependency as very dirty indeed.  However, there are only four rhodanines described in this article (shown below) and each as a heteroaromatic ring linked to the exocyclic double bond (extended pi-system is highly relevant to photochemistry) and each is substituted with ethyl on the ring nitrogen.  Furthermore, that heteroaromatic ring is linked to either a aryl or heteroaromatic ring in each of the four compounds.  So here’s how you deal with the reviewers.  First point out that the bad behavior is only observed for four rhodanines assayed against a single target protein.  If your rhodanines lack the exocyclic double bond, you can deal with the reviewers without breaking sweat because the substructural context of the rhodanine ring is so different and you might also mention that your rhodanines can’t function as Michael acceptors.  You should also be able to wriggle off the hook if your rhodanines have the exocyclic double bond but only alkyl substituents on it. Sanitizing a phenyl substituent on the exocyclic double bond is a little more difficult and you should first stress that the bad behavior was only observed for rhodanines with, five-membered electron-rich heterocycles linked to that exocyclic double bond.  You’ll also be in a stronger position if your phenyl ring lacks the additional aryl or heteroaryl substituent (or ring fusion) that is conserved in the four rhodanines described by Voss et al because this can be argued to be relevant to photochemistry.

Things will be more difficult if you’ve got a heteroaromatic ring linked to the exocyclic bond and this is when you’ll need to reach into the bottom draw for appropriate counter-measures with which to neutralize those uncouth reviewers.  First take a close look at that heteroaromatic ring.  If it is six-membered and/or relatively electron-poor, consider drawing the editor’s attention to the important differences between your heteroaromatic ring and those of in the offending rhodanines of Voss et al.  The lack of aryl or heteroaryl substituents (or ring fusions) on your heteroaromatic ring will also strengthen your case so make sure editor knows.  Finally, consider calculating molecular similarity between your rhodanines and those in Voss et al. You want this to be as low as possible so experiment with different combinations of fingerprints and metrics (e.g. Tanimoto coefficient) to find whatever gives the best results (i.e. the lowest similarity). 

So this is good place at which to conclude this post and this series of posts.  I hope you’ve found it fun and have enjoyed learning how to get away with polluting the literature.      

Wednesday, 18 March 2015

Is the literature polluted by singlet oxygen quenchers and scavengers?

<<   previous || next  >> 

So apparently I’m a critic of the PAINS concept so maybe it’s a good idea to state my position.  Firstly, I don’t know exactly what is meant by ‘PAINS concept’ so, to be quite honest, it is difficult to know whether or not I am a critic. Secondly, I am fully aware that many compounds are observed as assay hits for any of a number of wrong reasons and completely agree that it is important to understand the pathological behavior of compounds in assays so that resource does not get burned unnecessarily. At the same time we need to think more clearly about different types of behavior in assays.  One behavior is that the compound does something unwholesome to a protein and, when this is the case, it is absolutely correct to say, ‘bad compound’ regardless of what it does (or doesn't) do to other proteins.  Another behavior is that the compound interferes with the assay but leaves the target protein untouched and, in this case, we should probably say ‘bad assay’ because the assay failed to conclude that the protein has emerged unscathed from its encounter with the compound. It is usually a sign of trouble when structurally-related compounds show activity in a large number of assays but there are potentially lessons to be learned by those prepared to look beyond hit rates. If the assays that are hit are diverse in type then we should be especially worried about the compounds.   If, however, the assays that are hit are of a single type then perhaps the specific assay type is of greater concern. Even when hit rates are low, appropriate analysis of the screening output may still reveal that something untoward is taking place. For example, a high proportion of hits in common may reflect that a mechanistic feature (e.g catalytic cysteine) is shared between two enzymes (e.g. PTP and cysteine protease)  

While I am certainly not critical of attempts to gain a greater understanding of screening output, I have certainly criticized over-interpretation of data in print ( 1 | 2 ) and will continue to do so.  In this spirit, I would challenge the assertion, made in the recent Nature PAINS article that “Most PAINS function as reactive chemicals rather than discriminating drugs” on the grounds that no evidence is presented to support it.  As noted in a previous post, the term ‘PAINS’ was introduced to describe compounds that showed frequent-hitter behavior in a panel of six AlphaScreen assays and this number of assays would have been considered a small number even two decades ago when some of my Zeneca colleagues (and presumably our opposite numbers elsewhere in Pharma) started looking at frequent-hitters. After reading the original PAINS article, I was left wondering why only six of 40+ screens were used in the analysis and exactly how these six screens had been selected.  The other point worth reiterating is that only including a single type of assay in analysis like this makes it impossible to explore the link between frequent-hitter behavior and assay type. Put another way, restricting analysis to a single assay type means that the results of the analysis constitute much weaker evidence that compounds interfere with other assay types or are doing something unpleasant to target proteins.

I must stress that I’m definitely not saying that the results presented in the original PAINS article are worthless.  Knowledge of AlphaScreen frequent-hitters is certainly useful if you’re running this type of assay.  I must also stress that I’m definitely not claiming that AlphaScreen frequent hitters are benign compounds.  Many of the chemotypes flagged up as PAINS in that article look thoroughly nasty (although some, like catechols, look more ‘ADMET-nasty’ than ‘assay-nasty’).  However, the issue when analyzing screening output is not simply to be of the opinion that something looks nasty but to establish its nastiness (or otherwise) definitively in an objective manner.   

It’s now a good time to say something about AlphaScreen and there’s a helpful graphic in Figure 3 of the original PAINS article. Think of two beads held in proximity by the protein-protein interaction that you’re trying to disrupt.  The donor bead functions as a singlet oxygen generator when you zap it with a laser. Some of this singlet oxygen makes its way to the acceptor bead where its arrival is announced with the emission of light.  If you disrupt the protein-protein interaction then the beads are no longer in close proximity and the (unstable) singlet oxygen doesn’t have sufficient time to find an acceptor bead before it is quenched by solvent.  I realize this is a rushed explanation but I hope that you’ll be able to see that disruption of the protein-protein interaction will lead to a loss of signal because most of the singlet oxygen gets quenched before it can find an acceptor bead.

I’ve used this term ‘quench’ and I should say a bit more about what it means.  My understanding of the term is that it describes the process by which a compound in an excited state is returned to the ground state and it can be thought of as a physical rather than chemical process, even though intermolecular contact is presumably necessary.  The possibility of assay interference by singlet oxygen quenchers is certainly discussed in the original PAINS article and it was noted that:

“In the latter capacity, we also included DABCO, a strong singlet oxygen quencher which is devoid of a chromophore, and diazobenzene itself”

An apparent IC50 of 85 micromolar was observed for DABCO in AlphaScreen and that got me wondering about what the pH of the assay buffer might have been.  The singlet oxygen quenching abilities of DABCO have been observed in a number of non-aqueous solvents which suggests that the neutral form of DABCO is capable of quenching singlet oxygen.  While I don’t happen to know if protonated DABCO is also an effective quencher of singlet oxygen, I would expect (based on a pKa of 8.8) the concentration of the neutral form in an 85 micromolar solution of DABCO buffered at neutral pH to be about 1 micromolar.   Could this be telling us that quenching of singlet oxygen in AlphaScreen assays is possibly a bigger deal than we think?

Compounds can also react with singlet oxygen and, when they do so, the process is sometimes termed ‘scavenging’. If you just observe the singlet oxygen lifetimes, you can’t tell whether the singlet oxygen is returned harmlessly to its ground state or if a chemical reaction occurs.  Now if you read enough PAINS articles or PAINS-shaming blog posts, you’ll know that there is a high likelihood that, at some point, The Great Unwashed will be castigated for failing to take adequate notice of certain articles deemed to be of great importance by The Establishment.  In this spirit, I’d like to mention that compounds with sulfur doubly bonded to carbon have been reported ( 1 | 2 | 3 | 4 | 5 ) to quench or scavenge singlet oxygen and this may be relevant to the ‘activity’ of rhodanines in AlphaScreen assays.

The original PAINS article is a valuable compilation of chemotypes associated with frequent-hitter behavior in AlphaScreen assays although I have questioned whether or not this behavior represents strong evidence that compounds are doing unwholesome things to the target proteins.  It might be prudent to check the singlet oxygen quencher/scavenger literature a bit more carefully before invoking a high hit rate in a small panel of AlphaScreen assays in support of assertions that literature has been polluted or that somebody’s work is crap.  I’ll finish the post by asking whether tethering donor and acceptor beads covalently to each other might help identify compounds that interfere with AlphaScreen by taking out singlet oxygen. Stay tuned for the next blog post in which I’ll show you, with some help from Denis Healey and the Luftwaffe, how to pollute the literature (and get away with it).