Tuesday, 22 September 2015

Ligand efficiency metrics: why all the fuss?

I guess it had to happen.  Our gentle critique of ligand efficiency metrics (LEMs) has finally triggered a response in the form of a journal article (R2015) in which it is dismissed as "noisy arguments" and I have to admit, as Denis Healey might have observed, that this is like being savaged by a dead sheep.  The R2015 defense of LEMs follows on from an earlier one (M2014) in which the LEM Politburo were particularly well represented and it is instructive to compare the two articles.  The M2014 study was a response to Mike Schultz’s assertion ( 1 | 2 | 3 ) that ligand efficiency (LE) was mathematically invalid and its authors correctly argued that LE was a mathematically valid expression although they did inflict a degree of collateral damage on themselves by using a mathematically invalid formula for LE (their knowledge of the logarithm function did not extend to its inability to take an argument that has units).  In contrast, the R2015 study doesn’t actually address the specific criticisms that were made of LE and can be described crudely as an appeal to the herding instinct of drug discovery practitioners.  It actually generates a fair amount of its own noise by asserting that, "Ligand efficiency validated fragment-based design..." and fails to demonstrate any understanding of the points made in our critique.  The other difference between the two studies is that the LEM Politburo is rather less well represented in the R2015 study and one can only speculate as to whether any were invited to join this Quixotic assault on windmills that can evade lances by the simple (and valid) expedient of changing their units.  One can also speculate about what the responses to such an invitation might have been (got a lot on right now… would love to have helped… but don’t worry…go for it... you’ll be just fine... we’ll be right behind you if it gets ugly) just as we might also speculate as to whether Marshal Bosquet might have summed up the R2015 study as, "C'est magnifique, mais ce n'est pas la guerre".

It’s now time to say something about our critique of LE metrics and I’ll also refer readers to a three-part series ( 1 | 2 | 3 ) of blog posts and the ‘Ligand efficiency: nice concept, shame about the metrics’ presentation.  In a nutshell, the problem is that LE doesn’t actually measure what the LEM Politburo would like you to believe that it measures and it is not unreasonable to ask whether it actually measures anything at all. Specifically, LE provides a view of chemico-biological space that changes with the concentration units in which measures of affinity and inhibitory activity are usually expressed. A view of a system that changes with the units in which the quantities that describe the system are expressed might have prompted Pauli to exclaim that it was "not even wrong" and, given that LE is touted as a metric for decision-making, this is a serious deficiency that is worth making a fuss about.  If you want to base design decisions on such a metric (or planetary alignments for that matter) then by all means do so. I believe the relevant term is 'consenting adults'.

Now you might be thinking that it's a little unwise to express such heretical thoughts since this might cause the auto-da-fé to be kindled and I have to admit that, at times, the R2015 study does read rather like the reaction of the Vatican’s Miracle Validation Department to pointed questions about a canonization decision.  Our original criticism of LE was on thermodynamic grounds and this seemed reasonable given that LE was originally defined in thermodynamic terms. As an aside, anybody thinking of invoking Thermodynamics when proposing a new metric should consider the advice from Tom Lehrer in 'Be Prepared' that goes, "don't write naughty words on wall when you can't spell". Although the R2015 study complains that the thermodynamic arguments used in our critique were complex, the thermodynamic basis of our criticism was both simple and fundamental.  What we showed is that perception of ligand efficiency changes when we change the value of the concentration used to define the standard state for ΔG° and, given that the standard concentration is really just a unit, that is a very serious criticism indeed.  It is actually quite telling that the R2015 study doesn't actually say what the criticism was and to simply dismiss criticism as "noisy arguments" without counter-argument in these situations is to effectively run the white flag up the pole.  I would argue that if there was a compelling counter-argument to our criticism of LE then it's a pretty safe assumption that the R2015 study would have used it.  This is a good time to remind people that if you're going to call something a metric then you're claiming that it measures something and, if somebody calls your bluff, you'll need more than social media 'likes' to back your claims.

All that said, I realize that many in the drug discovery field find Thermodynamics a little scary so I'll try to present the case against LE in non-thermodynamic terms. I suggest that we use  IC50 which is definitely not thermodynamic and I'll define something called generalized ligand efficiency (GLE) to illustrate the problems with LE:

GLE = -(1/NHA´ log10(IC50/Cref )

In defining generalized ligand efficiency, I've dumped the energy units, used the number of heavy atoms (which means that we don't have to include a 'per heavy atom' when we quote values) and changed the basis for the logarithm to something a bit more user-friendly.  Given that LE advocates often tout the simplicity of LE, I just can't figure out why they are so keen on the energy units and natural logarithms especially when they so frequently discard the energy units when they quote values of LE.  One really important point of which I'd like you to take notice is that IC50 has been divided by an arbitrary concentration unit that I'll call  Cref  (for reference concentration). The reason for doing this is that you can't calculate a logarithm for a quantity with units and, when you say "IC50 in molar units", you are actually talking about the ratio of IC50 to a 1 M concentration.  Units are really important in science but they are also arbitrary in the sense that coming to different conclusions using different units is more likely to indicate an error in the 'not even wrong' category rather than penetrating insight. Put another way, if you concluded that Anne-Marie was taller than Beatrice when measured in inches how would you react when told that Beatrice was taller than Anne-Marie when measured in centimetres?  Now before we move on, just take another look at formula for GLE and please remember that Cref  is an arbitrary unit of concentration and that a metric is no more than a fancy measuring tape.

Ligand efficiency calculations almost invariably involve selection of 1 M as a the concentration unit although those using LE (and quite possibly those who introduced the metrics)  are usually unaware that this choice has been made. If you base a metric on a specific unit then you either need to demonstrate that the choice of unit is irrelevant or you need to justify your choice of unit. GLE allows us to to explore the consequences of arbitrarily selecting a concentration unit. As a potential user of a metrics, you should be extremely wary of any metric if those advocating its use evade their responsibility to be open with the assumptions made in defining the metric and the consequences of making those assumptions. Let's take a look at the consequences of tying LE to the 1 M concentration unit and I'll ask you to take a look at the table below which shows GLE values calculated for a fragment hit, a lead and an optimized clinical candidate.  When we calculate  GLE using 1 M as the concentration unit, all three compounds appear to be equally ligand-efficient but that picture changes when we choose another concentration unit.  If we choose 0.1 M as the concentration unit we now discover that the optimized clinical candidate is more ligand efficient than the fragment hit. However, if we choose a different concentration unit (10 M) we find that the fragment hit is now more ligand-efficient than the optimized clinical candidate. All we need to do now is name our fragment hit 'Anne-Marie' and our optimized clinical candidate 'Beatrice' and it will be painfully apparent that something has gone horribly awry with our perception of reality.
One should be very worried about basing decisions on a metric that tells you different things when you express quantities in different units because 10 nM is still 10 nM whether you call it 10000 pM, 10-2 mM or 10-8 M.  This is a good time to ask readers if they can remember LE being used to prioritize a one compound (or even a series) over another in a project situation.  Would the decision have been any different had you expressed of IC50 in different units when you calculated LE?   Was Anne-Marie really more beautiful than Beatrice?   The other point that I'd like you think about is the extent to which improperly-defined metrics can invalidate the concept that they are intended to apply. I do believe that a valid concept of ligand efficiency can be retrieved from the wreckage of metrics and this is actually discussed both in the conclusion of our LEM critique and and my recent presentation. But enough of that because I'd like to let the R2015 say a few words.

The headline message for the R2015 study asserts, “The best argument for ligand efficiency is simply the assertion that on average smaller drugs have lower cost of goods, tend to be more soluble and bioavailable, and have fewer metabolic liabilities”. I do agree that excessive molecular size is undesirable (we applied the term 'pharmaceutical risk factor' to molecular size and lipophilicity in our critique) but one is still left with the problem of choosing a concentration unit or, at least justifying the arbitrary choice of 1 M.  As such, it is not accurate to claim that the undesirability of excessive molecular size is the best argument for ligand efficiency because the statement also implicitly claims justification for the arbitrary choice of 1 M as the concentration unit.  I have no problem normalizing potency with respect to molecular size but doing so properly requires analyzing the data appropriately rather than simply pulling a concentration unit out of a hat and noisily declaring it to be the Absolute Truth in the manner in which one might assert the age of the Earth to be 6000 years..

I agree 100% with the statement, "heavy atoms are the currency we spend to achieve high-affinity binding, and all else being equal it is better to spend fewer for a given potency".  When we progress from a fragment hit to an optimized clinical candidate we spend by adding molecular size and our purchase is an increase in potency that is conveniently measured by the log of the ratio of IC50 values of the hit and optimized compound. If you divide this quantity by the difference in heavy atoms, you'll get a good measure of how good the deal was. Crucially, your perception of the deal does not depend on the concentration unit in which you express the  IC50 values and this is how group efficiency works. It's a different story if you use LE and I'll direct you back to the figure above so that you can convince yourself of this.

I noted earlier that the R2015 study fails to state what criticisms have been made of LE and simply dismisses them in generic terms as fuss and noise.  I don't know whether this reflects a failure to understand the criticisms or a 'head in the sand' reaction to a reality that is just too awful to contemplate.  Either way, the failure to address specific criticisms made of LE represents a serious deficiency in the R2015 study and it will leave some readers with the impression that the criticisms are justified even if they have not actually read our critique. The R2015 study asserts, "There is no need to become overly concerned with noisy arguments for or against ligand efficiency metrics being exchanged in the literature" and, given the analogy drawn in R2015 between LE and fuel economy, I couldn't help thinking of a second hand car dealer trying allay a fussy customer's concerns about a knocking noise coming from the engine compartment of a potential purchase.

So now it's time to wrap up and I hope that you have learned something from the post even if you have found it to be tedious (I do have to admit that metrics bore me shitless).  Why all the fuss?  I'll start by saying that drug discovery is really difficult and making decisions using metrics that alter your perception when different units are used is unlikely to make drug discovery less difficult. If we do bad analysis and use inappropriate metrics then those who fund drug discovery may conclude that the difficulties we face are of our own making.  Those who advocate the use of LE are rarely (if ever) open about the fact that LE metrics are almost invariably tied to a 1 M reference concentration and that changing this reference alters our perception of efficiency. We need to remember that the purpose of a metric is to measure and not to tell us what we want to hear. I don't deny that LE is widely used in drug discovery and that is precisely why we need to make a fuss about its flaws. Furthermore, there are ways to normalize activity with respect to molecular size that do not suffer from LE's deficiencies.  I'll even admit that LE is useful since the metric has proven to be a powerful propaganda tool for FBDD. So useful, in fact, that the R2015 study uses the success of FBDD as propaganda for LE. Is the tail wagging the dog or have the inmates finally taken over the asylum?

Thursday, 27 August 2015

On open access

I’ve been meaning to write something about open access (OA) for some time and I do genuinely believe that true OA is a worthwhile (and achievable) ideal. This blog post is not intended as a critique of OA although I will suggest that its advocates do need to think a bit more carefully about how they present the benefits. I'll be writing from the perspective of authors of scientific journal articles and disclose, in the interests of openness, that all but one of my publications have been in paywall journals. One general observation that I will make is that the sanctimonious, holier-than-thou attitude of some OA advocates (who may occasionally avail themselves of the advantages of publishing in paywall journals) does tend to grate at times.  Although there is actually a lot more to OA than who pays for journal articles to be published, I'll not be discussing those issues in this post. The principal theme of this post is that it may be helpful for those who opine loudly on the benefits of OA to take a more author-centric look at the relevant issues and this is especially important for those advocates whose contributions to the scientific literature have been modest.  

One of my gripes is that OA is sometimes equated with open science and I believe that the openness of science is actually a much bigger issue than who pays for scientific articles to be published in journals. Drug discovery research is my primary focus and a certain amount of what (we think that) we know in this field comes from analysis of proprietary databases.  Some of this analysis falls short of what one might feel entitled to expect from journals that would describe themselves as premier journals and you’ll find some examples in this presentation.  I have argued here and here that studies of this genre would actually pack a much heavier punch if the data was made publicly available. However, it is not only datasets that are undisclosed in drug discovery papers.  In some cases, models built from the data are not disclosed either (this does not inhibit authors from comparing their models favorably with previously published models). If the science in a data modelling study is to be accurately described as open then the data and models associated with the study have to be disclosed. This is not just an issue for paywall journals and the models described in the following OA articles do not appear to be fully disclosed:

The point that I'm making here is that a the science in a paywall journal is not necessarily less open than that in an OA journal. In my view, the openness of the science presented in a journal article is determined by how much its authors disclose rather than whether or not one needs to pay to read the article. It's also worth remembering that internet access is not free.

This is a good point to say say something about the economics of publishing articles in scientific journals. In economic terms, a journal article represents a service that the journal has provided for the author rather than the other way round (which many would regard as more sensible).  The author typically pays for the service by either signing over rights to paywall journal or by paying an article processing charge (APC) to a journal (OA or paywall) which means that the article can be freely viewed online and rights are (hopefully) retained by the author.  I need to note that not all OA journals levy APCs and, in these cases, it's a good idea to check how the journal is funded and whether or not articles are peer-reviewed. Payment of an APC implies that the journal article is technically an advertisement and this has interesting (and potentially even troubling) implications if the journal also publishes conventional advertisements. All articles in a paywall journal are usually considered (e.g. by download cost) to be of equal value and this is one of the bigger fictions in the scholarly publishing business.  The situation becomes more complicated with OA journals because the APC may vary with geographical region or be waived completely for some individuals. In the interest of openness and accountability, shouldn't open access journals disclose the actual APC paid for each article published?

There are compelling arguments that can be made in support of OA and, like many, I believe that the results of publicly-funded research should be fully visible to the public. This is probably a good point at which to raise the question of who should own intellectual property generated by publicly-funded research. OA broadens participation (by citizen scientists and emerging scientific communities in developing countries) and increases the likelihood of a study being read (and used) by the people with the most interest in it. OA greatly facilitates mining of the ever-expanding body of published literature which adds to its value and, unsurprisingly, some of the loudest voices calling for scientific literature to be made public property are those of informaticians and data scientists.  One question that OA advocates need to address is whether the results of mining and analysis of freely available literature should also be considered to be public property. While a strong and coherent case can be made for OA, it might not be such a great idea to invoke morality (can get a bit subjective) or fundamental human rights (people under siege in Syria might consider it less than tasteful that their suffering is being equated with their inability to access the latest articles in Nature).

There are a few hurdles that will need to be overcome if OA vision is to translate to OA reality. Advocates for OA typically demonize the paywall journals although the cynic might note that easily-recognized enemies are needed by the leaders of mass movements in order to galvanize the proletariat. Some declare that they will only review manuscripts for open access journals while others assert that they are boycotting Nature and Science (I scarcely dare to think of the mirth of that would result from me announcing to colleagues that I will no longer be submitting manuscripts to Nature). While demonizing paywall journals may make people feel good (and righteous), it misses the essential point that the OA movement needs to win hearts and minds so that authors (the creators of content) come to see OA journals as the most attractive option for publishing their studies.  Put more bluntly, authors are as at least as much to 'blame' as the journals for their articles 'disappearing' behind paywalls. It's also worth remembering that journal editors are also usually authors as are the people on academic hiring committees who might use journal impact factor to to decide that one research output of one applicant is worth more than that of another applicant.

One nettle that the OA movement does need to grasp is that journal publishing is frequently a business and, from the perspective of the 'customer', the terms 'paywall' and 'open access' simply describe alternative business models.  I'm certainly not saying that it has to be that way and there are other areas, such as the provision of health care, education and religion, where the case could be made for the exclusion of commercial activity.  However, the issue remains and, like or not, the OA movement is effectively lobbying in favor of one set of commercial interests over another set of commercial interests. This means that OA advocates need to be clear (and open) about the sources of any funding that should come their way.  The other nettle that the OA movement needs to grasp is that it should be very careful about whom it allows to speak on its behalf.  OA propaganda sometimes takes the form 'open access good, paywall bad' and the message loses some of its bite if it turns out that Napoleon has been publishing in Nature while denouncing Beall's List as reactionary and counter-revolutionary.

So let's take a look at OA from the perspective of a scientist who is trying to decide where to send a manuscript.  Typically the decision will be made by the principal investigator (PI) who will be well aware that funding and, if relevant, tenure decisions will be influenced by which journals in which the PI's articles have been published.  In some areas (e.g. biomedical), academic research is particularly competitive and PIs are well aware that an article published in an elite journal is visible to the PI's peers and all those who will be making decisions about the PI's future.  As such, publishing in an OA journal can represent a risk without compensating benefits (in competitive academic science, knowing that everybody on the planet can see your article fits into the 'nice to have' category). Another factor that inhibits authors from publishing in OA journals is the paucity of options in some disciplines (e.g. chemistry). I don't believe that there is currently a credible OA equivalent to Journal of Medicinal Chemistry and would be concerned about the ability of a general purpose OA journal to identify and recruit specialist reviewers. I'm not a computational biologist but, if I were, I'd definitlely be thinking twice about submitting an article to a computational biology journal that published Ten Simple Rules of Live Tweeting at Scientific ConferencesFinally there is the issue of cost because publishing in OA journals (or paying for OA in a paywall journal) does not usually come cheap and, in some cases, funding needs to be specifically sourced (as opposed to simply paying out of one's grant). Those scientists for whom the APC represents a significant expense will also be aware that they are effectively subsidizing other scientists for whom the APC has been waived. Something that may be going through the mind of more than one scientist is the direction in which APCs might head should all scientific publishing be forced to adopt an OA business model.

I think this is a good point at which to wrap things up and I should thank you for staying with me.  If you're an OA opponent and had been hoping to see OA skewered then please accept my apologies for having disappointed you. If you're an OA supporter, you may well disagree with many (or all) of the points that I've raised. If, however, something in this post has prompted you to think about things from a different angle then it will have achieved its purpose.

Sunday, 12 July 2015

Molecular recognition, controlled experiments and discrete chemical space

A recent Curious Wavefunction post (The fundamental philosophical dilemma of chemistry) got me thinking a bit.  Ash notes,

“For me the greatest philosophical dilemma in chemistry is the following: It is the near impossibility of doing controlled experiments on the molecular level. Other fields also suffer from this problem, but I am constantly struck by how directly one encounters it in chemistry”.  

I have a subtly different take on this although I do think that Ash is most definitely pointing the searchlight in the right direction and his post was discussed recently by members of the LinkedIn Computational Chemistry Group.

Ash illustrates the problem from a perspective that will be extremely familiar to anybody with experience in pharmaceutical molecular design,

“Let’s say I am interested in knowing how important a particular hydrogen bond with an amide in the small molecule is. What I could do would be to replace the amide with a non hydrogen-bonding group and then look at the affinity, either computationally or experimentally. Unfortunately this change also impacts other properties of the molecules; its molecular weight, its hydrophobicity, its steric interactions with other molecules. Thus, changing a hydrogen bonding interaction also changes other interactions, so how can we then be sure that any change in the binding affinity came only from the loss of the hydrogen bond? The matter gets worse when we realize that we can’t even do this experimentally”

The “gets worse” acknowledges that the contribution of an interaction to affinity is not an experimental observable.  As we noted in a recent critique of ligand efficiency metrics,

“In general, the contribution of a particular intermolecular contact (or group of contacts) to affinity (or the changes in enthalpy, entropy, heat capacity or volume associated with binding) cannot be measured experimentally”

It’s easy to see why this should be the case for interaction of a drug with its target because this happens in an aqueous environment and the binding event is coupled to changes in solvent structure that are non-local with respect to the protein-ligand interactions.  As former colleagues at AZ observe in their critique of enthalpy optimization,

“Both ΔH° and ΔS° are typically sums over an enormous amount of degrees of freedom, many contributions are opposing each other and comparable in amplitude, and in the end sums up to a comparably smaller number, ΔG°”

Water certainly complicates interpretation of binding thermodynamics in terms of structure but, even in gas phase, the contributions of intermolecular contacts are still no more observable experimentally than atomic charges. 

Let’s think a bit about what would constitute a controlled experiment in the context of generating a more quantitative understanding of the importance of individual interactions.  Suppose that it’s possible to turn a hydrogen bond acceptor into (apolar) hydrocarbon without changing the remainder of the molecule. We could measure the difference in affinity/potency pKd/pIC50 between the compound of interest and its analog in which the hydrogen bond acceptor has been transformed into hydrocarbon. The Figure below shows how this can be done and the pIC50 values (taken from this article) can be interpreted as evidence that hydrogen bonds to one of the pyridazine nitrogen are more beneficial for potency against cathepsins S and L2 than against cathepsin L.

Although analysis like this can provide useful insight for design, it still doesn’t tell us what absolute contributions these hydrogen bonds make to potency because our perception depends on our choice of reference. Things get more complicated if we are trying to assess the importance of the importance of the hydrogen bond donor and acceptor characteristics of a secondary amide in determining activity. We might replace the carbonyl oxygen atom with a methylene group to assess the importance of hydrogen bond acceptor ability.. er... maybe not.  The amide NH could be replaced with a methylene group but this will reduce the hydrogen bond acceptor strength of the carbonyl oxygen atom as well as changing the torsional preferences of the amidic bond.   This illustrates the difficulty, highlighted by Ash, of performing controlled experiments when trying to dissect the influences of different molecular properties on the behavior of compounds.

The above examples raise another issue that rarely, if ever, gets discussed.  Although chemical space is vast, it is still discrete at the molecular level and that may prove to be an even bigger dilemma than not being able perform controlled experiments at the molecular level.  As former colleagues and I suggested in our FBDD screening library article, fragment-based approaches may enable chemical space to be sampled at a more controllable resolution.  Could it be that fragments may have advantages in dealing with the discrete nature of chemical space? 

Wednesday, 1 July 2015

Into the mess that metrics make

So it looks like the post publication peer review of the article featured in the Expertitis blog post didn't go down too well but as Derek said recently, “…open, post-publication peer review in science is here, and it's here to stay. We'd better get used to it”.   In this post, I’m going to take a look at the webinar that drew me to that article and I hope that participants in that webinar will find the feedback to be both constructive and educational.

Metrics feature prominently in this webinar and the thing that worries me most is the Simplistic Estimate of Enthalpy (SEEnthalpy).   Before getting stuck into the thermodynamics, it’ll be necessary to point out some errors so please go to 20:10 (slide entitled ‘ligand efficiency lessons’) in the webinar.   

The ‘ligand efficiency lessons’ slide concedes that LE has been criticized in print but then incorrectly asserts that the criticism has been rebutted in print.  It is well-known that Mike Schultz has criticized ( 1 | 2 | 3 ) LE as being mathematically invalid and the counter to this is that LE (written as free energy of binding divided by number of non-hydrogen atoms) is a mathematically valid expression (even though the metric itself is thermodynamic nonsense). Nevertheless, Mike still identified the Achilles' heel of LE which is that a linear response of affinity to molecular size has to have zero intercept in order for that linear response to represent constant LE. It is also somewhat ironic that the formula for LE used in the rebuttal to Mike's criticism is itself mathematically invalid because it includes a logarithm of a quantity with units of concentration.  However, another criticism (article and related blog posts 1 | 2 | 3 ) has been made of LE and this has not been rebutted in print.   The ‘ligand efficiency lessons’ slide also asserts that "LE tends to decline with ligand size" and it should also be pointed out that some of the size dependency of LE is an artefact of the arbitrary choice of 1 mole per litre as the standard concentration (article and blog posts 1 | 2 ).   The Ligand Efficiency: Nice Concept, Shame About the Metrics presentation may also be helpful.
This is a good point to introduce SEEnthalpy and, were I writing a satire on drug discovery metrics, I could scarcely do better than this.  Perhaps even ‘Enthalpy – the musical’ with lyrics by Leonard Cohen (Some girls wander by mistake/ Into the mess that metrics make)?  Let’s pick this up at 22:43 in the webinar with the ‘Conventional wisdom – simplistic view’ slide which teaches us that rotatable bonds are hydrophobic interactions (you really did hear that correctly, “entropy comes from non-direct, hydrophobic interactions like rotatable bonds"). I don’t happen to agree with ‘conventional’ or ‘wisdom’ in this context although I do fully endorse ‘simplistic’. Let’s take a closer look at SEEnthalpy which is defined (24:45) from the numbers of hydrogen bond donors (#HBD), hydrogen bond acceptors (#HBA) and rotatable bonds (#RB):

SEEnthalpy = (#HBD + #HBA)/(#HBD + #HBA + #RB)

As I’ve mentioned before, definitions of hydrogen bonding groups can be ambiguous (as can be definitions of rotatable bonds) but I don’t want to get sidetracked by this right now because there are more pressing issues to deal with. You can learn a lot about metrics simply by thinking about them (have a look at comments on LELP in this article) and one question that you might like to ask yourselves is what is #RB (which they’re telling us is associated with entropy) doing in an enthalpy metric?  This may be a good time to note that the contribution of an intermolecular contact (or group of contacts) to the changes in enthalpy or entropy associated with binding is not, in general, an experimental observable.  Could also be a good idea to take a look at A Medicinal Chemist's Guide to Molecular Interactions and Ligand Binding Thermodynamics in Drug Discovery: Still a Hot Tip? 

I have to admit that I found the reasoning behind SEEnthalpy to be unconvincing but, given that the metric appears to have the endorsement of all participants of the webinar, perhaps we should at least give them the chance to demonstrate that it is meaningful. If I was introducing an enthalpy metric then the first thing that I’d do would be to (try to) show that it measured what it was claimed to measure (we call something a metric because it measures something and not because it's easy to remember the formula or because we've thought up a cute acronym). As it turns out, the Binding Database is public and simply oozing with the thermodynamic data that could be used to investigate the relationship between metric and reality.  This makes how they’ve chosen to evaluate the metric seem somewhat bizarre.

The first test of SEEnthalpy was performed using a TB activity dataset.  About 1000 of the compounds in this dataset were found to be active by high throughput screening (HTS) and the remaining 100-ish had come from structure-based drug design (SBDD).   The differences between the two groups of compounds were found to be significant although it is not clear what to make of this.  One interpretation is that the HTS compounds are screening hits (possibly from phenotypic screens) and the SBDD compounds have been optimized.  If this is the case, it will not be too difficult to perceive differences between the two groups of compounds and doing so does not represent one of the more pressing prediction problems in drug discovery.  This is probably a good time to note that correlation does not equal causation.  The other point worth making is that observing that two mean values are significantly different doesn’t tell us about the size of the effect which is more relevant to prediction.  If you want to illustrate the strength of the trend (as opposed to its statistical significance) then you need to show standard deviations with the mean values rather than just standard errors.  If this is unfamiliar territory then why not take a look at this article and make it more familiar.

The next test of SEEnthalpy was to investigate its correlation with biological activity and the chosen activity measure was %inhibition in protein kinase assays.  Typically when we model and analyze activity we use pIC50 or pKi (see this webinar, for example) and it is not at all clear why %inhibition was used instead given the vast amount of public data available in ChEMBL.  One problem with using %inhibition as a measure of activity is that it has a very limited dynamic range and there really is a reason that people waste all that time measuring a concentration response so that they can calculate pIC50 (or pKi).   Let’s pick up the webinar at 27:10 and at 28:13 a plot will appear.   This plot appears to have been generated by ordering compounds by SEEenthalpy and then plotting %inhibition against order.  If you look at the plot you’ll notice that the %inhibition values for most of the compounds are close to zero indicating that they are essentially inactive at the chosen test concentration which means that the correlation between %inhibition will be very weak.  But follow the webinar to 28:32 and you will learn that, “…for this particular kinase, there was a really clear relationship between SEEnthalpy and the activity”.  I’ll leave it to you, the reader, to decide for yourself how clear that trend actually is.

Let’s go to 28:42 in the webinar, at which point the question is posed as to whether SEEnthalpy correlates more strongly than other metrics with activity.   However, when we get to 28:53, we discover that it is actually statistical significance (p values) rather than strength of correlation that is the focus of the analysis.   It is really important to remember that given sufficiently large samples, even the most anemic of trends can acquire eye-wateringly impressive statistical significance and it is the correlation coefficient not the p value that tells us how strong a trend is. Once again, I am forced to steer the participants of the webinar towards this article and this time, in order to reinforce the message, I'll also link an article which illustrates the sort of tangle into which you can get if you confuse the strength of a trend with its statistical significance.

I think this is a good point at which to wrap things up. I'll start by noting that attempting to educate the drug discovery community and shape the thinking of that community's members can backfire if weaknesses in one's grasp of the fundamental science become apparent.  For example confusing the statistical significance of a trend with its strength in a webinar like this may get people asking uncouth questions about the machine learning models that were mentioned (although not discussed) during the webinar.  I'll also mention that the LE values are presented in the webinar without units and I'll finish off by posing the question of whether it is correct,especially for cell-based assays, to convert IC50 to molar energy for the purpose of calculating LE

Monday, 13 April 2015


I will never call myself an expert for to do so would be to take the first step down a very slippery slope.  It is important to remember that each and every expert has both an applicability domain and a shelf life.  It’s not the experts themselves to which I object but what I’ll call ‘expert-driven decision-making’ and the idea that you can simply delegate your thinking to somebody else.

I’m going to take a look at an article that describes attempts to model an expert’s evaluation of chemical probes identified by NIH-funded high throughput screening.  The study covers some old ground and one issue facing authors of articles like these is how to deal with the experts in what might be termed a ‘materials and methods context’.   Experts who are also coauthors of studies like these become, to some extent, self-certified experts.   One observation that can be made about this article is that its authors appear somewhat preoccupied with the funding levels for the NIH chemical probe initiative and I couldn't help wondering about how this might have shaped or influenced the analysis.

A number of approaches to modeling the expert’s assessment of the 322 probe compounds of which 79% were considered to be desirable.   The approaches used by the authors ranged from simple molecular property filtering to more sophisticated machine learning models.  I noted two errors (missing molar energy units; taking logarithm of quantity with units) in the formula for ligand efficiency and it’s a shame they didn’t see our article on ligand efficiency metrics which became available online about six weeks before they submitted their article (the three post series starting here may be helpful).  The authors state, “PAINS is a set of filters determined by identifying compounds that were frequent hitters in numerous high throughput screens” which is pushing things a bit because the PAINS filters were actually derived from analysis of the output from six high throughput screening campaigns (this is discussed in detail in three-post series that starts here).  Mean pKa values of 2.25 (undesirable compounds) and 3.75 (desirable compounds) were reported for basic compounds and it certainly wasn’t clear to me how compounds were deemed to be basic given that these values are well below neutral pH. In general, one needs to be very careful when averaging pKa values.  While these observations might be seen as nit-picking, using terms like ‘expert’, ‘validation’ and ‘due diligence’ in title and abstract does set the bar high.

A number of machine learning models were described and compared in the article and it’s worth saying something about models like these.  A machine learning model is usually the result of an optimization process.  When we build a machine learning model, we search for a set of parameters that optimizes an objective such as fit or discrimination for the data with which we train the model.  The parameters may be simple coefficients for parameters (like in a regression model) but they might also be threshold values for rules.  The more parameters you use to build a model (machine learning or otherwise), the more highly optimized the resulting model will be and we use the term ‘degrees of freedom’ to say how many parameters we’ve used when training the model.   You have to be very careful when comparing models that have different numbers of degrees of freedom associated with them and one criticism that I would make of machine learning models is that the number of degrees of freedom is rarely (if ever) given.  Over-fitting is always a concern with models and it is customary to validate machine learning models using one or more of a variety of protocols. Once a machine learning model has been validated, number of degrees of freedom is typically considered to be a non-issue. Clustering in data can cause validation to make optimistic assessments of model quality and the predictive chemistry community does need to pay more attention to Design of Experiments. Here’s a slide that I sometimes use in molecular design talks.

Let’s get back to the machine learning models in the featured article.  Comparisons were made between models (see Figure 4 and Table 5 in the article) but no mention is made of numbers of degrees of freedom for the models.   I took a look in the supplementary information to see if I could get this information by looking at the models themselves and discovered that the models had not actually been reported. In fact, the expert’s assessment of the probes had not been reported either and I don't believe that this article scores highly for either reproducibility or openness.   Had this come to me as a manuscript reviewer, the response would have been swift and decisive and you probably wouldn’t be reading this blog post. How much weight should those responsible for NIH chemical probes initiative give to the study?  I’d say they can safely ignore it because the data set is proprietary and models trained on it are only described and not actually specified.  Had the expert's opinion on the desirability (or otherwise) of the probes been disclosed then it would have been imprudent for the NIH folk to ignore what the expert had to say. At the same time, it's worth remembering that we seek different things from probes and from drugs and one expert's generic opinion of a probe needs to placed in the context of any specific design associated with the probe's selection.

However, there are other reasons that the NIH chemical probes folk might want to be wary of data analysis from this source and I'll say a bit more about these in the next post.


Wednesday, 1 April 2015

Ligand efficiency validated

So I have to admit that I got it wrong and it looks like Ligand Efficiency (LE) is actually thermodynamically valid after all. You’ll recall my earlier objections to LE on the grounds that the standard concentration of 1 M has been arbitrarily chosen and that our perception of compound quality depends on the units of concentration in which the relevant equilibrium constants are defined. A Russian friend recently alerted me to an article in the Siberian Journal of Thermodynamics in which she thought I might be interested and was kind enough to scan it for me because this journal is not available in the West due to the current frosty relations with Russia.   This seminal study by RR Raskolnikov demonstrates unequivocally that the 1 M standard concentration is absolutely and uniquely correct thus countering all known (and even some unknown) objections to LE as a metric of compound quality.  The math is quite formidable and the central proof is based on the convergence characteristics of the trace of the partial molar polarizability tensor.  Enigmatically, the author acknowledges the assistance of an individual named only as Porfiry Petrovich in helping him to find the truth after a long search.   Raskolnikov’s career path appears to have been rather unorthodox.  Originally from St Petersburg, he moved east because of some trouble with a student loan and for many years he was dependent on what his wife was able to earn.    

Sunday, 22 March 2015

Literature pollution for dummies

<< previous |  

So I thought that I’d conclude this mini-series ( 1 | 2 ) of PAINS posts with some lighter fare, the style of which is intended to be a bit closer to that of a PAINS-shaming post ( 1 | 2 ) than is normal for posts here.  As observed previously, the PAINS-shaming posts are vapid and formulaic although I have to admit that it’s always a giggle when people spit feathers on topics outside their applicability domains.   I must also concede that one of the PAINS-shaming posts was even cited in a recent article in ACS Medicinal Chemistry Letters although this citation might be regarded as more confirmation of the old adage that, ‘flattery will get you anywhere’ than indication of the scientific quality of the post.  I shouldn’t knock that post too much because it’s what goaded me into taking a more forensic look at the original PAINS article.  However, don’t worry if you get PAINS-shamed because, with acknowledgement to Denis Healey, being PAINS-shamed is “like being savaged by a dead sheep”

I should say something about the graphic with which I’ve illustrated this blog post.  It shows a diving Junkers Ju 87 Stuka and I’ll let aviation author William Green, writing in ‘Warplanes of the Third Reich’, tell you more about this iconic aircraft:

“The Ju 87 was an evil-looking machine, with something of the predatory bird in its ugly contours – its radiator bath and fixed, spatted undercarriage resembling gaping jaws and extended talons – and the psychological effect on the recipients of its attentions appeared almost as devastating as the bombs that it delivered with such accuracy.  It was an extremely sturdy warplane, with light controls, pleasant flying characteristics and a relatively high standard of manoeuvrability. It offered crew members good visibility and it was able to hit a target in a diving attack with an accuracy of less than 30 yards. All these were highly desirable characteristics but they tended to blind Ob.d.L. to the Ju 87’s shortcomings. Its use presupposed control of the air, for it was one of the most vulnerable of combat aircraft and the natural prey of the fighter…”

I really should get back on-topic because I doubt that Rudel ever had to worry about singlet oxygen while hunting T-34s on TheEastern Front.  I’ve promised to show you how to get away with polluting the literature so let’s suppose you’ve submitted a manuscript featuring PAINful structures and the reviewers have said, “Nein, es ist verboten”.  What should you do next? The quick answer is, “It depends”.  If the reviewers don't mention the orginal PAINS article and simply say that you’ve just not done enough experimental work to back up your claims then, basically, you’re screwed. This is probably a good time get your ego to take a cold shower and to find an unfussy open access journal that will dispense with the tiresome ritual of peer review and quite possibly include a package of downloads and citations for no additional APC.

Let’s look at another scenario, one in which the reviewers have stated that the manuscript is unacceptable simply because the compounds match substructures described in the original PAINS article.  This is the easiest situation to deal with although if you’re using AlphaScreen to study a protein-protein interaction you should probably consider the open access suggestion outlined above.  If not using AlphaScreen, you can launch your blitzkrieg although try not make a reviewer look like a complete idiot because the editor might replace him/her with another one who is a bit more alert.  You need to point out to the editor that the applicability domain (using this term will give your response a degree of authority) for the original PAINS filters is AlphaScreen used to assay protein-protein interactions and therefore the original PAINS filters are completely irrelevant to your submission.  You might also play the singlet oxygen card if you can find evidence (here’s a useful source for this type of information) for quenching/scavenging behavior by compounds that have aggrieved the reviewers on account of matching PAINS filters.

Now you might get a more diligent reviewer who looks beyond the published PAINS filters and digs up some real dirt on compounds that share a substructure with the compounds that you’ve used in your study and, when this happens, you need to put as much chemical space as you can between the two sets of compounds.  Let’s use Voss et al (I think that this was what one of the PAINS-shaming posts was trying to refer to) to illustrate the approach.  Voss et al describe some rhodanine-based TNF-alpha antagonists, the ‘activity’ of which turned out to be light-dependent and I would certainly regard this sort of light-dependency as very dirty indeed.  However, there are only four rhodanines described in this article (shown below) and each as a heteroaromatic ring linked to the exocyclic double bond (extended pi-system is highly relevant to photochemistry) and each is substituted with ethyl on the ring nitrogen.  Furthermore, that heteroaromatic ring is linked to either a aryl or heteroaromatic ring in each of the four compounds.  So here’s how you deal with the reviewers.  First point out that the bad behavior is only observed for four rhodanines assayed against a single target protein.  If your rhodanines lack the exocyclic double bond, you can deal with the reviewers without breaking sweat because the substructural context of the rhodanine ring is so different and you might also mention that your rhodanines can’t function as Michael acceptors.  You should also be able to wriggle off the hook if your rhodanines have the exocyclic double bond but only alkyl substituents on it. Sanitizing a phenyl substituent on the exocyclic double bond is a little more difficult and you should first stress that the bad behavior was only observed for rhodanines with, five-membered electron-rich heterocycles linked to that exocyclic double bond.  You’ll also be in a stronger position if your phenyl ring lacks the additional aryl or heteroaryl substituent (or ring fusion) that is conserved in the four rhodanines described by Voss et al because this can be argued to be relevant to photochemistry.

Things will be more difficult if you’ve got a heteroaromatic ring linked to the exocyclic bond and this is when you’ll need to reach into the bottom draw for appropriate counter-measures with which to neutralize those uncouth reviewers.  First take a close look at that heteroaromatic ring.  If it is six-membered and/or relatively electron-poor, consider drawing the editor’s attention to the important differences between your heteroaromatic ring and those of in the offending rhodanines of Voss et al.  The lack of aryl or heteroaryl substituents (or ring fusions) on your heteroaromatic ring will also strengthen your case so make sure editor knows.  Finally, consider calculating molecular similarity between your rhodanines and those in Voss et al. You want this to be as low as possible so experiment with different combinations of fingerprints and metrics (e.g. Tanimoto coefficient) to find whatever gives the best results (i.e. the lowest similarity). 

So this is good place at which to conclude this post and this series of posts.  I hope you’ve found it fun and have enjoyed learning how to get away with polluting the literature.