Today I'll take a look at a JMC Perspective on design principles for fragment libraries that is intended to provide advice for academics. When selecting compounds to be assayed the general process typically consists of two steps. First, you identify regions of chemical space that you hope will be relevant and then you sample these regions. This applies whether you're designing a fragment library, performing a virtual screen or selecting analogs of active compounds with which to develop structure-activity relationships (SAR). Design of compound libraries for fragment screening has actually been discussed extensively in the literature and the following selection of articles, some of which are devoted to the topic, may be useful: Fejzo (1999), Baurin (2004), Mercier (2005), Schuffenhauer (2005), Albert (2007) Blomberg (2009), Chen (2009), Law (2009), Lau (2011), Schulz (2011); Morley (2013). This series of blog posts ( 1 | 2 | 3 | 4) on fragment screening library design that may also be helpful.
"Rules are for the obedience of fools and the guidance of wise men"
Harry Day, Royal Air Force (1898-1977)
It wasn't exactly clear what the authors are getting at here since there appears to be no provision for wise women. Also it is not clear how the authors would view rules that required darker complexioned individuals to sit at the backs of buses (or that swarthy economists should not solve differential equations on planes). That said, the quote hands me a legitimate excuse to link Malan's Ten Rules for Air Fighting and I will demonstrate that the authors of this Perspective can learn much from the wise teachings of 'Sailor' Malan.
My first criticism of this Perspective is that the authors devote an inordinate amount of space to topics that are irrelevant from the viewpoint of selecting compounds for fragment screening. Whatever your views on the value of ligand efficiency metrics and thermodynamic signatures, these are things that you think about once you've got the screening results. The authors assert, "As a result, fragment hits form high-quality interactions with the target, usually a protein, despite being weak in potency" and some readers might consider the 'concept' of high-quality interactions to be pseudoscientific psychobabble on par with homeopathy, chemical-free food and the wrong type of snow. That said, discussion of some of these peripheral topics would have been more acceptable if the authors had articulated the library design problem clearly and discussed the most relevant literature early on. By straying from their stated objective, the authors have broken the second of Malan's rules ("Whilst shooting think of nothing else, brace the whole of your body: have both hands on the stick: concentrate on your ring sight").
The section on design principles for fragment libraries opens with a slightly gushing account of the Rule of 3 (Ro3). This is unfortunate because this would have been the best place for the authors to define the fragment library design problem and review the extensive literature on the subject. Ro3 was originally stated in a short communication and the analysis that forms its basis is not shared. As an aside, you need to be wary of rules like these because the cutoffs and thresholds may have been imposed arbitrarily by those analyzing the data. For example, the GSK 4/400 rule actually reflects the scheme used to categorize continuous data and it could just have easily been the GSK 3.75/412 rule if the data had been pre-processed differently. I have written a couple ( 1 | 2 ) of blog posts on Ro3 but I'll comment here so as to keep this post as self-contained as possible. In my view, Ro3 is a crude attempt to appeal to the herding instinct of drug discovery scientists by milking a sacred cow (Ro5). The uncertainties in hydrogen bond acceptor definitions and logP prediction algorithms mean that nobody knows exactly how others have applied Ro3. It also is somewhat ironic that the first article referenced by this Perspective actually states Ro3 incorrectly. If we assume that Ro5 hydrogen bond acceptor definitions are being used then Ro3 would appear to be an excellent way to ensure that potentially interesting acidic species such as tetrazoles and acylsulfonamides are excluded from fragment screening libraries. While this might not be too much of an issue if identification of adenine mimics is your principal raison d'etre, some researchers may wish to take a broader view of the scope of FBDD. It is even possible that rigid adherence to Ro3 may have led to the fragment starting points for this project being discovered in Gothenburg rather than Cambridge. Although it is difficult to make an objective assessment of the impact of Ro3 on industrial FBDD, its publication did prove to be manna from heaven for vendors of compounds who could now flog milligram quantities of samples that had previously been gathering dust in stock rooms.
This is a good point to see what 'Sailor' Malan might have made of this article. While dropping Ro3 propaganda leaflets, you broke rule 7 (Never fly straight and level for more than 30 seconds in the combat area) and provided an easy opportunity for an opponent to validate rule 10 (Go in quickly - Punch hard - Get out). Faster than you can say "thought leader" you've been bounced by an Me 109 flying out of the sun. A short, accurate (and ligand-efficient) burst leaves you pondering the lipophilicity of the mixture of glycol and oil that now obscures your windscreen. The good news is that you have been bettered by a top ace whose h index is quite a bit higher than yours. The bad news is that your cockpit canopy is stuck. "Spring chicken to shitehawk in one easy lesson."
Of course, there's a lot more to fragment screening library design than counting hydrogen bonding groups and setting cutoffs for molecular weight and predicted logP. Molecular complexity is one of the most important considerations when selecting compounds (fragments or otherwise) and anybody even contemplating compound library design needs to understand the model introduced by Hann and colleagues. This molecular complexity model is conceptually very important but it is not really a practical tool for selecting compounds. However, there are other ways to define molecular complexity in ways that allow the general concept to be distilled into usable compound selection criteria. For example, I've used restriction of extent of substitution (as detailed in this article) to control complexity and this can be achieved using SMARTS notation to impose substructural requirements. The thinking here is actually very close to the philosophy behind 'needle screening' which was first described in 2000 by researchers at Roche although they didn't actually use the term 'molecular complexity'.
As one would expect, the purging of unwholesome compounds such as PAINS is discussed. The PAINS field suffers from ambiguity, extrapolation and convolution of fact with opinion. This series ( 1 | 2 | 3 | 4) of blog posts will give you a better idea of my concerns. I say "ambiguity" because it's really difficult to know whether the basis for labeling a compound as a PAIN (or should that be a PAINS) is experimental observation, model-based prediction or opinion. I say "extrapolation" because the original PAINS study equates PAIN with frequent-hitter behavior in a panel of six AlphaScreen assays and this is extrapolated to pan-assay (which many would take to mean different types of assays) interference. There also seems to be a tendency to extrapolate the frequent-hitter behavior in the AlphaScreen panel to reactivity with protein although I am not aware that any of the compounds identified as PAINS in the original study were shown to react with any of the proteins in the AlphaScreen panel used in that study. This is a good point to include a graphic to break the text up a bit and, given an underlying theme of this post, I'll use this picture of a diving Stuka.
One view of the fragment screening mission is that we are trying to present diverse molecular recognition elements to targets of interest. In the context of screening library design, we tend to think of molecular recognition in terms of pharmacophores, shapes and scaffolds. Although you do need to keep lipophilicity and molecular size under tight control, the case can be made for including compounds that would usually be considered to be beyond norms of molecular good taste. In a fragment screening situation I would typically want to be in a position to present molecular recognition elements like naphthalene, biphenyl, adamantane and (especially after my time at CSIRO) cubane to target proteins. Keeping an eye on both molecular complexity and aqueous solubility, I'd select compounds with a single (probably cationic) substituent and I'd not let rules get in the way of molecular recognition criteria. In some ways compound selections like those above can be seen as compliance with Rule 8 (When diving to attack always leave a proportion of your formation above to act as top guard). However, I need to say something about sampling chemical space in order to make that connection a bit clearer.
This is a good point for another graphic and it's fair to say that the Stuka and the B-52 differed somewhat in their approaches to target engagement. The B-52 below is not in the best state of repair and, given that I took the photo in Hanoi, this is perhaps not totally surprising. The key to library design is coverage and former bombardier Joseph Heller makes an insightful comment on this topic. One wonders what First Lieutenant Minderbinder would have made of the licensing deals and mergers that make the pharma/biotech industry such an exciting place to work.
The following graphic, pulled from an old post, illustrates coverage (and diversity) from the perspective of somebody designing a screening library. Although I've shown the compounds in a 2 dimensional space, sampling is often done using molecular similarity which we can think of inversely related to distance. A high degree of molecular similarity between two compounds indicates that their molecular structures are nearby in chemical space. This is a distance-geometric view of chemical space in which we know the relative positions of molecular structures but not where they are. When we describe a selection of molecular structures as diverse, we're saying that the two most similar ones are relatively distant from each other. The primary objective of screening library design is to cover relevant chemical space as effectively as possible and devil is in the details like 'relevant' and 'effectively'. The stars in the graphic below show molecular structures that have been selected to cover the chemical space shown. When representing a number of molecular structures by a single molecular structure it is important, as it is in politics, that what is representative not be too distant from what is being represented. You might ask, "how far is acceptable?" and my response would be, as it often is in Brazil, "boa pergunta". One problem is scaffolds differ in their 'contributions' to molecular similarity and activity cliffs usually provide a welcome antidote to the hubris of the library designer.
I would argue that property distributions are more important than cutoff values for properties and it is during the sampling phase of library design that these distributions are shaped. One way of controlling distributions is to first define regions of chemical space using progressively less restrictive selection criteria and then sample these in order, starting with the most restrictively defined region. However, this is not the only way to sample and might also try to weight fragment selection using desirability functions. Obviously, I'm not going to provide a comprehensive review of chemical space sampling in a couple of paragraphs of a blog post but I hope to have shown that the sampling of chemical space is an important aspect of fragment screening library design. I also hope to have shown that failing to address the issue of sampling relevant chemical space represents a serious deficiency of the featured Perspective.
The Perspective concludes with a number of recommendations and I'll conclude the post with comments on some of these. I wouldn't have too much of a problem with the proposed 9 - 16 heavy atom range as a guideline although I would consider a requirement that predicted octanol/water logP be in the range 0.0 - 2.0 to be overly restrictive. It would have been useful for the authors to say how they arrived at these figures and I invite all of them to think very carefully about exactly what they mean by "cLogP" and "freely rotatable bonds" so we don't have a repeat of the Ro3 farce. There are many devils in the details of the statement:"avoid compounds/functional groups known to be associated with high reactivity, aggregation in solution, or false positives". My response to "known" is that it is not always easy to distinguish knowledge from opinion and "associated" (like correlated) is not a simple yes/no thing. It is not cleat how "synthetically accessible vectors for fragment growth" should be defined since there is also a conformational stability issue if bonds to hydrogen are regarded as growth vectors.
The Perspective concludes with a number of recommendations and I'll conclude the post with comments on some of these. I wouldn't have too much of a problem with the proposed 9 - 16 heavy atom range as a guideline although I would consider a requirement that predicted octanol/water logP be in the range 0.0 - 2.0 to be overly restrictive. It would have been useful for the authors to say how they arrived at these figures and I invite all of them to think very carefully about exactly what they mean by "cLogP" and "freely rotatable bonds" so we don't have a repeat of the Ro3 farce. There are many devils in the details of the statement:"avoid compounds/functional groups known to be associated with high reactivity, aggregation in solution, or false positives". My response to "known" is that it is not always easy to distinguish knowledge from opinion and "associated" (like correlated) is not a simple yes/no thing. It is not cleat how "synthetically accessible vectors for fragment growth" should be defined since there is also a conformational stability issue if bonds to hydrogen are regarded as growth vectors.
This is a good point at which to wrap things up and I'd like to share some more of Sailor Malan's wisdom before I go. The first rule (Wait until you see the whites of his eyes. Fire short bursts of 1 to 2 seconds and only when your sights are definitely 'ON') is my personal favorite and it provides excellent, practical advice for anybody reviewing the scientific literature. I'll leave you with a short video in which a pre-Jackal Edward Fox displays marksmanship and escaping skills that would have served him well in the later movie. At the start of the video, the chemists and biologists have been bickering (of course, this never really happens in real life) and the VP for biophysics is trying to get them to toe the line. Then one of the biologists asks the VP for biophysics if they can do some phenotypic screening and you'll need to watch the video to see what happens next...