Molecular Design: March 2016

Thursday 10 March 2016

Ligand efficiency beyond the rule of 5

One recurring theme in this blog is that the link between physicochemical properties and undesirable behavior of compounds in vivo may not be as strong as property-based design 'experts' would have us believe. To be credible, guidelines for drug discovery need to reflect trends observed in relevant, measured data and the strengths of these trends tells you how much weight you should give to the guidelines. Drug discovery guidelines are often specified in terms of metrics, such as Ligand Efficiency (LE) or property forecast index (PFI), and it is important to be aware that every metric encodes assumptions (although these are rarely articulated).

The most famous set of guidelines for drug discovery is known as the rule of 5 (Ro5) which is essentially a statement of physicochemical property distributions for compounds that had progressed at least as far as as Phase II at some point before the Ro5 article was published in 1997. It is important to remember (some 'experts' have short memories) that Ro5 was originally presented as a set of guidelines for oral absorption. Personally, I have never regarded Ro5 as particularly helpful in practical lead optimization since it provides no guidance as to how suboptimal ADMET characteristics of compliant compounds can be improved. Furthermore, Ro5 is not particularly enlightening with respect to the consequences of straying out the allowed region and into 'die Verbotenezone'.

Nobody reading this blog needs to be reminded that drug discovery is an activity that has been under the cosh for some time and a number of publications ( 1 | 2 | 3 | 4 ) examine potential opportunities outside the chemical space 'enclosed' by Ro5. Given that drug-likeness is not the secure concept that those who claim to be leading our thoughts would have us believe, I do think that we really need to be a bit more open minded in our views as to the regions of chemical space in which we are prepared to work. That said, you cannot afford to perform shaky analysis when proposing that people might consider doing things differently because that will only hand a heavy cudgel to the roundheads for them to beat you with.

The article that I'll be discussing has already been Pipelined and this post has a much narrower focus than Derek's post. The featured study defines three regions of chemical space: Ro5 (rule of 5), eRo5 (extended rule of 5) and bRo5 (beyond rule of 5). The authors note that "eRo5 space may be thought of as a buffer zone between Ro5 and bRo5 space". I would challenge this point because there is a region (MW less than 500 Da and ClogP between 5 and 7.5) between Ro5 and bRo5 spaces that is not covered by the eRo5 specifications. As such, it is not meaningful to compare properties of eRo5 compounds with properties of Ro5 or bRo5 compounds. The authors of featured article really do need to fix this problem if they're planning to carve out niche in this area of study because failing to do so will make it easier for conservative drug-likeness 'experts' to challenge their findings. Problems like this are particularly insidious because the activation barriers for fixing them just keep getting higher the longer you ignore them.

But enough of Bermuda Triangles in the space between Ro5 and bRo5 because the focus of this post is ligand efficiency and specifically its relevance (or otherwise) to bRo5 cmpounds. I'll write a formula for generalized LE is a way that makes it clear that DG° is a function of temperature, pressure and the standard concentration:

LE_gen = -DG°(T,p,C°)/HA

When LE is calculated it is usually assumed that C° is 1 M although there is nothing in the original definition of LE that says this has to be so and few, if any, users of the metric are even aware that they are making the assumption. When analyzing data it is important to be aware of all assumptions that you're making and the effects that making these assumptions may have on the inferences drawn from the analysis.

Sometimes LE is used to specify design guidelines. For example we might assert that acceptable fragment hits must have LE above a particular cutoff. It's important to remember that setting a cutoff for LE is equivalent to imposing an affinity cutoff that depends on molecular size. I don't see any problem with allowing the affinity cutoff to increase with molecular size (or indeed lipophilicity) although the response of the cutoff to molecular size should reflect analysis of measured data (rather than noisy sermons of self-appointed thought-leaders). When you set a cutoff for LE, you're assuming (whether or not you are aware of it) that the affinity cutoff is a line that intersects the affinity axis at a point corresponding to K_Dof 1 M. Before heading back to bRo5, I'd like you to consider a question. If you're not comfortable setting an affinity cutoff as a function of molecular size would you be comfortable setting a cutoff for LE?

So let's take a look at what the featured article has to say about affinity:

"Affinity data were consistent with those previously reported [44] for a large dataset of drugsand drugs in Ro5, eRo5 and bRo5 space had similar means and distributions of affinities (Figure 6a)"

So the article is saying that, on average, bRo5 compounds don't need to be of higher affinity than Ro5 compounds and that's actually useful information. One might hypothesize that unbound concentrations of bRo5 compounds tend to be lower than for Ro5 compounds because the former are less drug-like and precisely the abominations that MAMO (Mothers Against Molecular Obesity) have been trying to warn honest, god-fearing folk about for years. If you look at Figure 6a in the featured article, you'll see that the mean affinity does not differ significantly between the three categories of compound. Regular readers of this blog will be well aware that that categorizing continuous data in this manner tends to exaggerate trends in data. Given that the authors are saying that there isn't a trend, correlation inflation is not an issue here.

Now look at Figure 6b. The authors note:

"As the drugs in eRo5 and bRo5 space are significantly bigger than Ro5 drugs, i.e., they have higher molecular weights and more heavy atoms, their LE is significantly lower"

If you're thinking about using these results in your own work, you really need to be asking whether or not the results provide any real insight (i.e. something beyond the the trivial result that 1/HA gets smaller when HA gets larger? This would also be a good time to think very carefully about all the assumptions you're going to make in your analysis. The featured article states:

"Ligand efficiency metrics have found widespread use;[45] however, they also have some limitations associated with their application, particularly outside traditional Ro5 drug space. [46] We nonetheless believe it is useful to characterize the ligand efficiency (LE) and lipophilic ligand efficiency (LLE) distributions observed in eRo5 and bRo5 space to provide guides for those who wish to use them in drug development"

Given that I have asserted that LE is not even wrong and have equated it with homeopathy, I'm not sure that I agree with sweeping LE's probems under the carpet by making a vague reference to "some limitations". Let's not worry too much about trivial details because declaring a ligand efficiency metric to be useful is a recognized validation tool (even for LELP which appears have jumped straight from the pages of a Mary Shelley novel). There is a rough analogy with New Math where "the important thing is to understand what you're doing rather to get the right answer" although that analogy shouldn't be taken too far because it's far from clear whether or not LE advocates actually understand what they are doing. As an aside, New Math is what inspired "the rule of 3 is just like the rule of 5 if you're missing two fingers" that I have occasionally used when delivering harangues on fragment screening library design.

So let's see what happens when one tries to set an LE threshold for for bRo5 compounds. The featured article states:

"Instead, the size and flexibility of the ligand and the shape of the target binding site should be taken into account, allowing progression of compounds that may give candidate drugs with ligand efficiencies of ≥0.12 kcal/(mol·HAC), a guideline that captures 90% of current oral drugs and clinical candidates in bRo5 space"

So let's see how this recommended LE threshold of 0.12 kcal/(mol.HA) translates to affinity thresholds for compounds with molecular weights of 700 Da and 3000 Da. I'll assume a temperature of 298 K and C° of 1 M when calculating DG°and will use 14 Da/HA to convert molecular weight to heavy atoms. I'll conclude the post by asking you to consider the following two questions?

The recommended LE threshold transforms to a pK_D threshold of 4.4 at 700 Da. When considering progression of compounds that may give candidate drugs, would you consider a recommendation that K_D should be less than 40 mM to be useful?

The recommended LE threshold transforms to a pK_D threshold of 19 at 3000 Da. How easy do you think it would be to measure a pK_D value of 19? When considering progression of compounds that may give candidate drugs, would you consider a recommendation that pK_D be greater than 19 to be useful?

Monday 7 March 2016

On Sci-Hub

Many readers will have heard of Sci-Hub which makes almost 50 million copyrighted journal articles freely available. Derek has blogged about Sci-Hub and has also suggested that it might not matter as much as some think that it does. Readers might also want to take a look at some other posts ( 1 | 2 | 3 ) on the topic. I'll focus more on some of the fallout that might result from Sci-Hub's activities but won't be expressing an opinion as to who is right and who is wrong. Briefly, one side says that knowledge should be free, the other side says that laws have been broken. I'll leave it to readers to decide for themselves which side they wish to take because nothing I write is likely to change people's views on this subject.

Sci-Hub and its creatrix are based in Russia and, given the current frosty relations between Russia and the countries which host the aggrieved journal publishers, it is safe to assume that Sci-Hub will be able to thumb its nose at the those publishers for the foreseeable future. Sci-Hub relies relies on helpers to provide it with access to to the copyrighted material and these helpers presumably do this by making their institutional subscription credentials available to Sci-Hub. It's worth noting that one usually accesses copyrighted material through a connection that is recognized by the publisher and only a very small number of people at an institution actually know the access keys/passwords. One important question is whether or not publishers can trace the PDFs supplied by Sci-Hub. I certainly recall seeing PDFs from certain sources being marked with the name of the institution and date of download so I don't think that one can safely assume that no PDF is traceable. If a publisher can link a PDF supplied by Sci-Hub to a particular institution then presumably the publisher could sue the institution because providing third parties with access is specifically verboten by most (all?) subscription contracts. An institution facing a legal challenge from a publisher would be under some pressure to identify the leaks and publishers would be keen for some scalps pour encourager les autres.

While it would be understatement to say that the publishers are pissed off that Sci-Hub has managed to 'liberate' almost 50 million copyrighted journal articles, it is not clear how much lasting damage has been done. The fee for downloading an article to which one does not have subscription access is typically in the range $20 to $50 but my guess is that only a tiny proportion of publishers' revenues comes from these downloads. I actually think the publishers set the download fees to provide institutions with the incentive to purchase subscriptions rather than to generate revenue from pay-per-view. If this is the case, Sci-Hub will only do real damage to the publishers if, by continuing to operate, it causes institutions to stop subscribing or helps them to negotiate cheaper subscriptions.

There is not a lot that the publishers can do about the material that Sci-Hub already has in its possession but there are a number of tactics that they might employ in order to prevent further 'liberation' of copyrighted material. I don't know if it is possible to engineer a finite lifetime into PDF files but they can be protected with passwords and publishers may try to only allow a small number of individuals at each institution to direct access the copyrighted material as PDF files. Alternatively the publishers might require that individual users create accounts and change passwords regularly in order to make it more difficult (and dangerous) for Sci-Hub's helpers to share their access. Countermeasures put in place by publishers to protect content are likely to add complexity to the process of accessing that content. This in turn would make it more difficult to mine content and the existence (and scale) of Sci-Hub could even be invoked as a counter to arguments that the right to read is the right to mine.

Given that almost 50 million articles are freely available on Sci-Hub, one might consider potential implications for Open Access (OA). There is a lot of heated debate about OA although the issues are perhaps not as clear cut as OA advocates would have you believe and this theme was explored in a post from this blog last year. Although there is currently a lot of pressure to reduce the costs of subscriptions, it is difficult to predict how far Sci-Hub will push subscription-based journal publishers towards a purely OA business model. For example, we may see scientific publication moving towards a 'third way' in the form of pre-publication servers with post-publication peer review. I wouldn't be surprised to learn that 'direct to internet' has usurped both subscription-based and OA scholarly publishing models twenty years from now. That, however, is going off on a tangent and, to get things back on track, I'd like you to think of Sci-Hub from the perspective of an author who has paid a subscription-based journal $2000 to make an article OA. Would it be reasonable for this author to ask for a refund?