Molecular Design: 2011

Thursday, 3 November 2011

Rule of 3 takes some flak

You might remember my rantlet about the Rule of Three (Ro3) at the beginning of the year. Well it seems that others consider Ro3 to be over-restrictive and this JMC article will be as welcome in some parts of Cambridge as a staffel of Ju87 Stukas would have been back in 1940.

The authors of this work describe screening of a library of 364 fragments against the aspartyl protease endothiapepsin and crystal structures of 11 hits bound to the target protein. The library was designed “without strictly applying the rule of 3” and, as it turns out, “only 4 of the 11 fragments are consistent with the rule of 3”. Not exactly a ringing endorsement for Ro3 or a compelling incentive to buy a Ro3-compliant fragment library.

Hopefully one point that you’ll have taken away from my earlier post is that those who gave us Ro3 don’t say a whole lot about how they define hydrogen bond donors and acceptors so it can be difficult to say whether some fragments are Ro3-complaint or not. I’m guessing that Ro3’s proposers may not be be using the same definitions that are used to apply the Rule of 5 (Ro5). However, I really don’t know and don’t really care.

Now you can see the problem. Do any of the 7 fragment hits (I refuse to call them frits since that term is more usually associated with ceramics than Drug Discovery and an association with ceramics is something that you’ll want to avoid for fragments) from the endothiapepsin screen that are reported to be inconsistent with Ro3 actually fail to comply with the rule? Let’s take a look at the structures of two fragments for which binding to target was observed crystallographically. You’ll notice that I’ve retained the structure numbering from the article.

There are two nitrogens and three oxygens in 041 so with Ro5 hydrogen bonding definitions this fragment would not be Ro3-compliant. Personally, I wouldn’t count the amide nitrogen as an acceptor but that’s my choice. The cyclic ether oxygens in 041 will be very weak hydrogen bond acceptors and even three or four of these together will pack less of a punch than the typical amide carbonyl oxygen. One can make similar observations about 291 and I’d actually be more worried about the reactivity of the aniline portion of this molecule although this is not the place for that discussion. Fragment 255 has 3 nitrogens and one oxygen which translates into 4 hydrogen bond acceptors if you use Ro5 definitions. However, I would not count either the amide nitrogen or the bridging nitrogen of the fused heteroaromatic ring as acceptors so with my definitions the fragment would be Ro3-compliant.

The assay and crystallisation were carried out at a pH of 4.6. This means that the heteroaromatic nitrogens of 255 and 291 are likely to be protonated to a significant extent under experimental conditions. It’s interesting that the solubility measurements were run at a pH of 7.4 because basic fragments such as 041, 255 and 291 should be even more soluble in assay and crystallisation buffers. It would be a different story for acids but I didn’t see any of those so I guess no harm done.

It’s good to see the output of a fragment screen being published in this manner and the crystal structures for a number of fragments bound to this target represent a welcome addition to the protein knowledge base. Given that I’ve never been a fan of Ro3, I do like to see others questioning the rule although reading this paper gives the impression that Ro3 has never before been questioned. I also believe that they could have addressed the issue of hydrogen bonding definitions rather than simply jumping to the conclusion that Ro5 definitions (all nitrogens and oxygens are acceptors) were being used by those who gave us Ro3.

On a final note, you might wonder why I keep banging on about Ro3 when it’s something that I’ve never used to select fragments. It’s a good question and this is a good place to answer it. My own view is that the way many researchers have blindly adopted the rule is merely a symptom of a much bigger malaise in Drug Discovery research. Pharma appears to be dans la merde but the response of its leaders is typically to increase the frequency with which the Titanic’s complement of deck chairs is shuffled. Is this really a good time for those who represent the best chance of a future for the industry to be switching off their critical thinking skills?

Literature cited

Köster et al, A Small Nonrule of 3 Compatible Fragment Library Provides High Hit Rate of Endothiapepsin Crystal Structures with Various Fragment Chemotypes. J. Med. Chem. 2011, in press. DOI

Monday, 26 September 2011

Dans la merde

I’ve been meaning to write something on the state of Pharma ever since my good friend Anthony Nicholls posted What Is Really Killing Pharma back in April. Ant sees an industry that is rapidly abandoning its science base and he is less than complimentary about Pharma management:

‘One consequence of this shift from science to business in the pharma industry has been less and less appreciation for the realities—as opposed to the hype and hope—of drug discovery. This is reflected both in the quixotic choices made by pharma as to what to pursue and in the stunningly bad management of the core talent in drug discovery.’

Now I don’t happen to fully agree with Ant on the diagnosis and it is possible that the underwhelming management of Pharma is more a symptom of the underlying disease than a disease in its own right. However, I will make some comments on Pharma management before moving onto some of what I see to be the industry's real woes. One brutal assessment of the situation is that, if society has decided that discovery of new medicines is to be a commercial activity, we (as members of society) should not complain when pharmaceutical businesses behave like businesses. I don’t believe that it is actually necessary for a Pharma CEO to understand the science although it’s a bonus if they do. Given that the time to bring a drug from hypothesis to market is longer than the tenure of many CEOs, it’s much more important that they be prepared to take a long term view and understand how the different parts of the business fit (and function) together. The shareholders of the company need to find mechanisms to persuade their CEO take a long term view. The CEO needs to seek advice from people who are prepared to tell the truth and not sideline them when they do. Those managing the drug hunt must avoid becoming panacea-centric in their thinking and remember that, while technology is a good servant, it is a poor master.

We need to take a closer look at the industry to better understand Pharma’s woes. As many people know, bringing a drug to market takes a long time and is also very expensive. The industry is highly regulated and the cost to the regulators of accepting something that they shouldn’t have greatly exceeds that of rejecting something that they should have accepted. Since Pharma companies don’t usually see themselves as in the Generics business, they need a steady stream of new products in order to remain viable. Unfortunately, this stream has slowed to a trickle and it is perfectly reasonable to question whether Drug Discovery is still a commercially viable activity.

It is easy to blame management for the current state of the industry and I’ll be the first to admit that many CEOs appear to be poor value for their employers (the shareholders). However, the current state of pipelines also reflects unmet scientific challenges and one can argue that the frequently bizarre behavior of Pharma leaders reflects increasing desperation in their search for other solutions.

Generally, a Drug Discovery project starts with a hypothesis. Typically, this will take the form that interfering (drugs are usually inhibitory) with the action of a target or system of targets (e.g. a pathway) will result in a therapeutically beneficial effect. Testing these hypotheses is termed Target Validation (TV) and usually one will try to develop an animal model of the human disease before taking a drug into development. Let’s think about why a drug may fail to show efficacy in a Phase 2 clinical trial. One explanation is that there is no link between target and disease (another way of saying that the TV hypothesis is incorrect). However, it could also be that the target is still valid but the animal model is simply not predictive of the human disease. Needless to say, TV is challenging and even reproducing claims made in the scientific literature can be difficult (readers who are LinkedIn may also wish to check out the discussion in the Society of Laboratory Automation and Screening group entitled "Reliability of 'new drug target' claims called into question").

However, there is yet another reason that a drug can fail to show efficacy in Phase 2 and that’s poor pharmacokinetics (PK). Now many of you are probably thinking that I’m talking from the wrong end of my alimentary canal when I say this because everybody knows that Phase 1 is where PK failures happen. You run a Phase 1 clinical trial to check that levels of the drug will be sufficiently high to engage the target and this is most relevant when the target ‘sees’ the blood stream. However, when the target is intracellular or on the far side of the blood brain barrier, we know a lot less about the free (unbound) concentration of drug in the vicinity of the target. Now you’ll see the problem and I’ll leave it to you to decide whether we’re dealing with known unknowns or unknown unknowns. The blood levels look great but we have little idea about what’s happening where it really matters. For intracellular and CNS targets it can be argued that the Phase 1 trial is less complete than for targets such as cell surface receptors that are exposed to the drug circulating in plasma. How much less complete is anybody’s guess because measuring free concentrations of an arbitrary drug in cells is just not something that we can currently do, even in laboratory animals.

This is probably a good time to bring up the subject of toxicity and it’s worth mentioning that the point made about free concentration is also relevant to toxicity (and ‘polypharmacology’). Pretty much the worst thing that can happen to a drug is that idiosyncratic toxicity reveals itself when the drug is already on the market. Rare toxicity is fiendishly difficult to predict and its rareness means that you have to dose a large number of patients (who may also be taking other medication) in order to even observe it. The rareness of the toxicity means that the enrichment studies that are so popular with the virtual screening and QSAR communities are unlikely to shed much light on the toxicity. Choking in Phase 3 is certainly bad but you can always console yourself with the knowledge that it could have been even worse.

So what’s really killing Pharma? There’s no shortage of gutless and witless managers in Pharma and there would be huge benefits in ensuring that undiluted Darwinian principles applied freely to the Leadership Function (surely a strong candidate for oxymoron of the month) of the industry. Would this be enough to save Pharma from a dearth of well-validated targets? Or one bust too many in the Phase 3 Casino? What do you think?

Literature cited

Prinz, Schlange & Asadullah, Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 2011, 10, 712-713. DOI

Thursday, 15 September 2011

A SMARTS way to do things?

A couple of months ago I returned from a visit to OpenEye in Santa Fe, New Mexico. I’d been helping out with tautomers and ionisation and it really was great to be back in one of my favourite States of the Union catching up with some old friends while making some new ones. However, it’s neither tautomers nor ionisation that I’ll be discussing in this post because I really want to talk about SMARTS. This is a line notation for defining substructural queries and a SMARTS parser with full capability is one of the most powerful weapons in the molecular design arsenal. One of the things that I did in Santa Fe was to learn a bit about using the OpenEye SMARTS parser. I like to think of SMARTS as empowering in that a SMARTS parser allows me to impose my will on a database of chemical structures. This really brings out my latent megalomaniac and makes me want to gaze at large wall-mounted maps of the world…

SMARTS notation is actually very simple but at the same time is highly expressive. It’s best illustrated using some examples. Let’s start with a simple definition for a neutral carboxylic acid and I’ve kept things simple by not requiring a connection between the carbon and another carbon atom.

[OH]C=O

When dealing with commercially available collections of compounds, the carboxylic acids may be registered both in neutral and anionic (salt) forms. Although people in Pharma may whinge about, this one has to remember that a compound vendor needs to distinguish benzoic acid from sodium benzoate and I have no time for lily-livered whingers. As Marie Antoinette might have said, “Let them eat SMARTS”. Here are a couple of SMARTS queries that will match either neutral or anionic forms of carboxylic acids. [O;H,-] specifies an oxygen atom that either has a single hydrogen or a negative charge while [OD1] specifies an oxygen atom with a single non-hydrogen connection.

[O;H,-]C=O

[OD1]C=O

A SMARTS parser with full capability will not only match the substructural pattern but will also map individual atoms. This is really useful for atom typing and remember that you can get a lot of information (e.g. ionisation, interaction potential) about an atom from its connectivity. In a pharmacophore search I would want to treat both oxygen atoms of the carboxylic acid as anionic and might do this using recursive SMARTS as follows.

[$([OH]C=O),$(O=C[OH])

One of my favourite features of SMARTS is the vector binding which associates a SMARTS pattern with a label and allows you to create patterns that are much more human-readable. This is really important when creating a view of chemistry that is to be imposed on chemical databases. I’ll show how you can build a simple definition of aliphatic amines (remember that these usually protonate under normal physiological conditions) using vector bindings. First let’s define a carbon with four connections.

Csp3 [CX4]

Now we’ll use this to define primary, secondary and tertiary amines which we’ll then combine into a single all aliphatic amine definition. Notice how I ‘over-specify’ the nitrogen connectivity in order to prevent matching against amine oxides, protonated amines and quaternary ammonium.

PriAmin [N;H2;X3][$Csp3]

SecAmin [N;H;X3]([$Csp3])[$Csp3]

TerAmin [NX3]([$Csp3])([$Csp3])[$Csp3]

AllAmin [$PriAmin,$SecAmin,$TerAmin]

So that finishes our quick introduction to SMARTS notation. In my own work, I’ve used SMARTS not only to locate structural features in molecules but also to modify the molecules, for example to set ionisation states in a database of structures to be docked into the binding site of a protein. Being able to modify structures automatically and in a controlled manner also makes it possible to do cool stuff like identify matched molecular pairs ( mmp1 | mmp2 ). I should mention that there is a SMARTS-like notation called SMIRKS for modifying structures although I’m not going to say anything about it right now.

There’s plenty of information about SMARTS out there, including a Wikipedia page and the Daylight SMARTS Theory Manual, Tutorial and Examples. The Daylight and OpenEye SMARTS parsers are provided as tool kits (so you can build your own software) and both support recursive SMARTS and vector bindings (not all SMARTS parsers do this so check with your software vendor). I started with the Daylight product back in 1995 and taught myself some C in order to use it. However, the OpenEye SMARTS parser can also be used with 3D structures and I’m looking forward to doing lots more with it.

I’ll finish with some comments on terminology. A substructural definition written in SMARTS notation can be called a SMARTS pattern, a SMARTS string or even a SMARTS. Whatever you do, don’t call it a SMART (you wouldn’t talk about a specie in relation to living organisms) because that will make you look half-witted (and make me cringe). Also to talk about a SMILE or a SMIRK would be equally crass so don’t say I didn’t warn you.

Literature cited

Kenny & Sadowski Structure Modification in Chemical Databases, Methods and Principles of Medicinal Chemistry 2005, 23, 271-285 | DOI

Leach et al Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure J. Med. Chem. 2006, 49, 6672–6682 | DOI

Birch et al, Matched molecular pair analysis of activity and properties of glycogen phosphorylase inhibitors. Bioorg Med Chem Lett 2009, 19, 850-853 | DOI

Wednesday, 29 June 2011

Lipophilicity teaser

This post got prompted one by Dan at Practical Fragments and I'm going to ask you to first take a look at that and at the comments. Now I'd like you to look at some measured octanol/water logP values that I pulled from the Sangster Research Laboratories logPow database. The question I'd like to put to you is whether you think that these measured logP values truly reflect the energetic costs of moving the different isomeric methylimidazoles from water to a truly non-polar environment like a hydrophobic binding pocket in a protein.

Let's take a look these figures. The least lipophilic compound of the set is N-methylimidazole in which the hydrogen bond donor of imidazole has been capped, although the partition coefficients for the three compounds are all very similar. It seems that the octanol/water partitioning system just doesn't seem to 'see' the hydrogen bond donors of the 2-methyl and 4/5-methyl isomers.

Octanol has a hydroxyl group and, in the context of a shake-flask logP determination, gets pretty wet (~2M), making it a unconvincing model for the hydrophobic core of a lipid bilayer or a hydrophobic binding pocket. In contrast, alkanes lack hydrogen bonding capability which also means that they also dissolve less water. The catch is that alkane/water partition coefficients are more difficult to measure than their octanol/water equivalents since polar solutes are poorly soluble in alkane solvents.

The difference between octanol/water and alkane/water logP values for a compound (often termed ΔlogP) is one measure of the polarity of the compound. The octanol/water logP of phenol is 1.5 and it would be reasonable to describe it as lipophilic. However in the alkane/water system the situation is reversed and the logP of -0.6 would lead to phenol being described as hydrophilic.

I'll leave things here for now because this post is really just a teaser and I will be returning to the theme in more depth in the future. If you're interested in finding out more take a look at my harangue from the March 2011 PhysChem Forum at Syngenta and the article that goes with it. I'd also recommend reading this review by Wolfenden if you're interested in the relevance of alkane/water logP values to protein structure and function.

Literature cited

Toulmin, Kenny & Wood, Toward prediction of alkane/water partition coefficients. J. Med. Chem. 2008, 51, 3720-3730. DOI

Wolfenden, Experimental Measures of Amino Acid Hydrophobicity and the Topology of Transmembrane and Globular Proteins. J. Gen. Physiol. 2007, 129, 357-362. DOI

Saturday, 25 June 2011

FBDD & Molecular Design

The FBDD Literature blog is getting a bit of a makeover. One of the reasons for doing this is that since I escaped from Big Pharma my access to literature has been erratic, making it difficult to maintain the required awareness of the current literature. However, a bigger reason for the changes was to broaden the focus of the blog to include Molecular Design, which is my primary scientific interest.

There is of course a lot of molecular design in FBDD, which I like to think of as little more than a smart way to do structure-based design. Molecular design may be defined as control of properties of compounds and materials through manipulation of molecular properties. Although computational chemistry tools are very useful in molecular design, the essence of design is thinking about molecules and I don’t want people without a CompChem background to be put off by the blog having Molecular Design in its title.

There will still be plenty of fragment-based material in the blog since I will be continuing the series on screening library design which came to a halt on Easter Island a year and half ago. However, I’m also planning some posts on physicochemical properties such as logP and logD which are important in FBDD but have a much broader relevance in Drug Discovery.

Sunday, 10 April 2011

A short rant about journal editors

In the previous post I had an indirect dig at journal editors. In this post the dig will be a lot more direct. Recently I accepted an ‘invitation’ to review a manuscript for a journal that, out of tact, I will not name. It always amuses me that these requests for what is effectively free consultancy are presented as ‘invitations’ as if the journal is doing me a huge favour. Nevertheless I do go through with the charade on occasion (although never to the extent of unctuously thanking the editor for his or her magnanimity) since I do regard reviewing manuscripts as the duty of anyone who publishes in journals. The review was duly completed and, given that I was recommending that the manuscript be put out of its misery as quickly and humanely as possible, I’d been thorough, devoting four or five hours to the assignment.

I’d typed the review as word document, planning to paste it into the relevant form in the editorial system. When I logged in the assignment was no longer there so I emailed the Editor and Support assuming that there was a problem with the system. I got a reply from Support explaining what had happened. The Editor had already made the decision and therefore didn’t need my input any more so the assignment had been deleted. Support noted that this was unfortunate and hoped that they could utilize my services again as a reviewer and I’m still waiting for the Editor’s apology. Not wanting to deprive them of feedback, I suggested that they were being overly optimistic if they thought that I would even consider reviewing another manuscript for them. And that’s where things stand. Humph!

So that was the teaser. What I really wanted to talk about was an editorial entitled ‘Science Blogs and Caveat Emptor’ that appeared in another journal late last year. ChemBark was onto it in a flash and soon it had been Pipelined as well. More recently the editorial was reviewed by Michelle Francl (blogs: 1 | 2 ) in her Nature Chemistry column and to be honest there’s not a lot that I can add to what these commentators have already said. Reading the editorial I couldn’t help thinking that it looked like it could have been pulled right out of a blog and Michelle is right on target when she says, “... I had to admire Murray for his ability to raise so many key questions about science writing in a concise and provocative 619 words. He has real potential as a ‘blogger’”. Except that most blogs allow you to post comments.

Provided that their journals score highly enough, Impact Factor becomes a Maginot Line behind which editors can hide and I was not surprised to see it paraded in the first paragraph of the editorial. One statement that I couldn’t quite get my head round was, “By extension, editors and reviewers reinforce the meaningfulness of Impact Factors by explicit attention to the reliability of submitted articles; if the Scientific Method has not been adequately followed, then there should be a downwardly adjusted evaluation of impact”. I’d always thought that impact factor was determined by numbers of citations and a citation made the same contribution regardless whether an author was heaping praises his previous study or drawing the attention of readers to an odor of something other than roses emanating from a rival's article.

One of the more bizarre assertions made in the editorial is, “Bloggers are entrepreneurs who sell “news” (more properly, opinion) to mass media: internet, radio, TV, and to some extent print news”. Having never received payment for any of my bloggings, I do find this statement a little rich coming as it does from somebody whose journal invites me to purchase content ($35 for 48 hours of access) when I try to look at it. Furthermore, some journals are actually devoted to publishing Opinions and these journals certainly don’t let you see their content for free.

I think what the author of the editorial really doesn’t like about scientific bloggers is their ability to do post-publication review of the journal’s articles in a very public manner. The illusion of the infallibility of Peer Review is often the first casualty when bloggers (and their commentators) discuss specific scientific articles. But can you blame the Editors when all they can see is that the Heretics have taken over the Auto-da-Fé.

Literature cited

R Murray, Science Blogs and Caveat Emptor. Anal. Chem., 2010, 82, 8755 DOI

M Francl, Blogging on the sidelines. Nat. Chem. 2011, 3, 183-184 DOI

Monday, 14 February 2011

FBLD versus DOS

The relative merits of Fragment Based Ligand Design (FBLD) and Diversity-Oriented Synthesis (DOS) were recently debated in a Nature Forum. This debate has already been reviewed both in Practical Fragments and In The Pipeline.

I believe by setting up the debate like this, the editorial staff of the journal show a poor understanding of Lead Discovery (LD). In essence a comparison is being made between apples and oranges. FBLD (also known as FBLG with the LG for lead generation) is an integrated LD framework and a comparison with conventional high throughput screening (HTS) and associated Hit-to-Lead (H2L) chemistry would have made more sense. DOS is essentially an approach to extending the chemical space covered by screening collections and filling ‘holes’ in the existing chemical space. A DOS approach could be easily used to enhance existing fragment libraries (especially if using molecular shape to quantify similarity) while the output of a fragment screen could be used as input to design of DOS libraries. The ‘Core and Layer’ approach that I’ve used in design of generic fragment libraries (and even one library for cell screening) can accurately be described as diversity-oriented.

The case for FBLD is made by Philip Hajduk who makes the important points that a relatively small number of fragments can be used to cover a relatively large chemical space and that synthetic resource is always directed towards the target of interest. I like to say that leads from FBLD are assembled from proven molecular recognition elements and would add that fragments allow you to search chemical space with a better-controlled resolution than do more elaborated molecules. I don’t happen to agree with his assertion that “there is ample evidence that larger molecules are more likely than smaller ones to succeed as drugs in clinical trials” but this does not weaken the first two points that he makes. It's worth remembering that you usually need protein crystal structures in FBLD both for the target (at the outset of screening) and for complexes with weakly-bound fragments. If you don't obtain these quickly you're going to be working on Project Passchendaele.

DOS is championed by Warren Galloway and David Spring. They note that there are situations in which FBLD is not currently applicable, for example in phenotypic screens (see Derek Lowe's comments In The Pipeline) or for probing certain protein-protein interactions. I agree with this point and believe that we’ll always need a variety of assays for successful LD, especially as Drug Discovery is expected to get even more challenging in the future. If you’re trying to enhance the ability of screening libraries to hit targets then it makes sense to use molecular diversity criteria to extend coverage in a more systematic manner. I don’t see why the term DOS should only apply when molecular size exceeds an arbitrary cut off and believe the real issue is more about how than whether DOS should be used to enhance screening libraries.

The advocates of DOS need to take a close look at how they define diversity. If the conserved core of a DOS library cannot be accommodated in a binding site then, barring nuclear fusion, none of the compounds in the library will fit either. From the point of view of this target the library has no diversity regardless of the number of compounds in it.

I was disappointed that molecular complexity (check this link for an alternative view) was not raised by either party in this debate since it’s a unifying concept that brings together different strategies for compound library design. Very complex molecules leave the H2L chemists with little or no room to manoeuvre. This is less of a problem if the screening hit nails the target with nanomolar potency and has jaw-dropping bioavailability. However, reality is more likely to be micromolar with one or more ADMET issues needing to be addressed. Advocates of DOS really do need to start thinking a bit more about molecular complexity in the context of screening compounds for biological activity. I always encourage folk designing a DOS library to make a relatively large sample of the library prototype so that it can be included in the fragment screening collection.

So what’s the verdict? I believe that FBLG is here to stay although it is not yet clear how widely applicable the approach is. I also believe that some form of DOS can be used to enhance any screening collection provided that:

(1) Diversity is seen in the context of the existing collection
(2) The importance of hit exploitability is recognised

I’d be interested to hear what other people think about this topic so feel free to comment. I’ll also set this up as a discussion for the FBDD LinkedIn group since commenting there is a bit easier. Also don’t forget that the journal allows you to comment on the article directly.

Literature cited

Hajduk, Galloway & Spring, A question of library design (Forum Drug Discovery). Nature 2011, 470, 42-43 | DOI

Nicholls et al, Molecular Shape and Medicinal Chemistry: A Perspective. J. Med Chem. 2010, 53, 3862-3886 | DOI

Hann, Leach & Harper, Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Comput. Sci., 2001, 41, 856–864 | DOI

Tuesday, 11 January 2011

Rule of Three considered harmful?

I should start this post by saying that I’ve never actually used the Rule of Three for fragment selection. Part of the reason for this is simply a matter of timing since I’d been designing fragment libraries before the Rule of Three came along. However, I believe that there are reasons that you need to take a very close look at the Rule of Three if you’re planning to build a fragment library strategy around it. The rule was introduced in late 2003:

“We carried out an analysis of a diverse set of fragment hits that were identified against a range of targets. The study indicated that such hits seem to obey, on average, a ‘Rule of Three’, in which molecular weight is < 300, the number of hydrogen bond donors is ≤3, the number of hydrogen bond acceptors is ≤3 and ClogP is ≤3. In addition, the results suggested NROT (≤3) and PSA (≤60) might also be useful criteria for fragment selection. These data imply that a ‘Rule of Three’ could be useful when constructing fragment libraries for efficient lead discovery.”

My first criticism of the Rule of Three is that the authors do not say how they define hydrogen bond acceptors. I’ll illustrate this point with reference to the phenylhydantoin below which along with the accompanying properties was retrieved from eMolecules. As far as I’m concerned, this compound would have been perfectly acceptable for inclusion in a fragment library before the Rule of Three was published and the publication of the rule would not make change my mind. If, however, you asked me whether the compound complied with the Rule of Three, I’d have to admit that I simply don’t know. The number of hydrogen bond donors is not an issue because there is only one of these in the molecule. The number of acceptors is more problematic. I would only count the oxygen atoms in this molecule as acceptors and, since there are two of these, the molecule would be compliant with the Rule of Three. However the well-known Rule of Five treats all nitrogen and oxygen atoms as acceptors so if you use those criteria you’ll count a total of four acceptors and conclude that the compound is not compliant with the Rule of Three. This is not a problem for me because I don't use the Rule of Three but spare a thought for the person assembling a commercial fragment library.

My second criticism of the Rule of Three concerns how it was actually derived. The authors describe performing “an analysis of a diverse set of fragment hits” without actually saying anything about what this analysis entailed. If they were analysing hits from their own fragment screens then the characteristics of the hits will reflect the criteria by which compounds were selected for fragment screening. If they were sampling from a more extensive database of screening hits, I’d still want to know how the fragment hits were distinguished from the other hits.

My third criticism is as much about how cut offs get used as it is of the Rule of Three. There’s a diagram of a funnel that you often see in virtual screening reviews. We also use funnels (or filters as we prefer to call them) in screening library design and in fact this activity is not a whole lot different from working up a virtual screen. Typically we apply filters and sample (e.g. using molecular diversity criteria) from what makes it through. Note that I say ‘filters’ rather than ‘a filter’. The Core and Layer (CaL) approach to library design has been described both in this blog and in a journal article. In CaL the filters used prioritise compounds get less restrictive as more compounds are added to the library. The reason for doing this is that it gives better control of chemical space coverage since it forces the selection of the smallest and least complex molecules first. A molecular diversity maximiser such as BigPicker, will tend to pick larger, more complex molecules because these tend to be more dissimilar to each other.

I am also prepared to accept compounds that have measured/calculated logP values in excess of 3 provided that the appropriate precautions (select ionisable compounds and/or use measured solubility values) have been taken to minimise the risk of poor solubility. You don’t want a whole library of compounds with logP values in excess of 4 but having some will increase the range of targets that you can nail. I am more concerned about the distribution of logP and molecular size in a library than I am with their maximum values and believe using multiple cut offs allows better control of these distributions.

You'll find plenty of material on the internet that deals with the Rule of Three although inconsistencies can be observed. It is not clear whether or not the Rule of Three includes the restrictions on NROT and PSA. As I read it in the original article, I don't think it does but I'm not sure and think it could have been made clearer. This webpage (accessed 11-Jan-2011) appears to suggest that Maybridge FBDD team think that the NROT and PSA criteria are included in the Rule of Three. However, another webpage (accessed 11-Jan-2011) seems to suggest that the FBDD team at Chembridge think otherwise. Cambridge Medchem Consulting (accessed 11-Jan-2011; I expect that this page will get updated once the error is discovered) appear to share the Chembridge view that the NROT and PSA criteria are not included in the Rule of Three although they use < instead of ≤ when stating the Rule which makes a big difference when the number in question is 3. Yet another variation on the Rule of Three can be found in the BioScreening.net glossary (accessed 12-Jan-2011) in which the hydrogen bond criteria are stated as "number of H-bond donors and acceptors less than, or equal to 3", which could be taken to imply that the sum of donors and acceptors cannot exceed 3.

I should of course let you know where the title of this post comes from since I borrowed most of it from a computer science paper that is over forty years old. I can’t even claim originality for adapting the title of the earlier paper because my friends at OpenEye have beaten me to that as well.

I hope that this post will at least make people ask a few questions when presented with rules like these in the future. I'll also set up a discussion in the LinkedIn Medicinal Chemistry group which will facilitate posting of comments.

Literature cited

Congreve, Carr, Murray & Jhoti, A ‘Rule of Three’ for fragment-based lead discovery? Drug Discov. Today 2003, 8, 876-877 | DOI

Lipinski, Lombardo, Dominy &Feeney, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 1997, 23, 3-25 | DOI

Blomberg, Cosgrove, Kenny & Kolmodin, Design of compound libraries for fragment screening. JCAMD, 2009, 23, 513-525 | DOI

Dijkstra, go to statement considered harmful. Communications of the ACM, 1968, 11, 147-148 | DOI