Saturday, 4 February 2012
JCAMD 25th Anniversary Issue
The editors of the Journal of Computer-Aided Molecular Design commissioned a number of Perspectives on the state and future of the field to commemorate the journal's 25th anniversary. They have made this content open access for a limited period (I believe 3 months) so go check it out while the access is still open.
Friday, 3 February 2012
Fragment lead identification by SPR
FBDD is a maturing field and one sign this maturation is the publication of a volume of Methods in Enzymology devoted to the subject. The article in this collection that most interested me was the review by Anthony Giannetti on the use of Surface Plasmon Resonance (SPR) in Fragment Lead Generation. The review is described as a ‘comprehensive walk-through’ and in-depth treatment of topics such as target immobilization and buffer/compound preparation justifies this description. I’m still working my way through some of the data analysis sections...
The target is tethered to a surface in SPR and this is usually referred to as ‘immobilization’, which is an unfortunate term, albeit the one that is most commonly used in the literature. Vendors of competing assay technologies (who would naturally prefer you to use their technology instead) often present this as a weakness of SPR. One concern is that tethering will compromise the ability of the target to bind ligands and the review does cite a couple of articles which compare affinities measured with SPR to those measured using methods such as isothermal titration calorimetry.
The system in an SPR assay is heterogenous, which is another way of saying that the concentration of protein is not uniform, particularly in the direction perpendicular to the surface to which it is tethered and this creates some interesting possibilities. Tight binding occurs when the value of the ligand Kd is lower than the concentration of the protein to which it binds. We typically configure assays for measuring affinity and potency so that ligand concentration is significantly greater than protein concentration. This means that ligand binding does not affect the concentration of unbound ligand and the math is a whole lot easier if you can make this assumption. If, however, the protein concentration in your assay is 1nM and you want to measure the potency for a compound with an IC50 of 0.01nM you’re going have a problem because you’ll need the compound at a concentration of 0.5nM in order to occupy half the binding sites. In enzyme inhibition assays, the concentration of the enzyme limits sets an upper limit on the potency that you can measure and this may be an issue for attempts to estimate the maximum potency of ligands.
In a heterogeneous system, things are not quite as simple because concentration is less clearly defined and you need to think in terms of quantities (in molar terms of course) of protein and ligand. Localising a small amount of protein on the chip surface rather than having a larger amount of protein distributed evenly throughout the sample volume means less depletion of the reservoir of unbound ligand when 50% of binding sites become occupied. Also in the SPR assay, the solution of ligand flows over the chip, making depletion of unbound ligand even less of a problem.
Tight binding is not usually a problem when screening fragments and the main reason for bringing up the subject was to get you thinking a bit about assays. There are a number of technologies for detecting the binding of fragments and quantifying the affinity with which they bind. This raises a couple of questions. Firstly, to what extent do we need new screening technologies for FBDD? Secondly, which weaknesses in the current methodology should be addressed with the highest priority?
Literature cited
Giannetti, From experimental design to validated hits: A comprehensive walk-through of fragment lead identification using surface plasmon resonance. Methods Enzymol. 2012, 493, 169-218. DOI
The target is tethered to a surface in SPR and this is usually referred to as ‘immobilization’, which is an unfortunate term, albeit the one that is most commonly used in the literature. Vendors of competing assay technologies (who would naturally prefer you to use their technology instead) often present this as a weakness of SPR. One concern is that tethering will compromise the ability of the target to bind ligands and the review does cite a couple of articles which compare affinities measured with SPR to those measured using methods such as isothermal titration calorimetry.
The system in an SPR assay is heterogenous, which is another way of saying that the concentration of protein is not uniform, particularly in the direction perpendicular to the surface to which it is tethered and this creates some interesting possibilities. Tight binding occurs when the value of the ligand Kd is lower than the concentration of the protein to which it binds. We typically configure assays for measuring affinity and potency so that ligand concentration is significantly greater than protein concentration. This means that ligand binding does not affect the concentration of unbound ligand and the math is a whole lot easier if you can make this assumption. If, however, the protein concentration in your assay is 1nM and you want to measure the potency for a compound with an IC50 of 0.01nM you’re going have a problem because you’ll need the compound at a concentration of 0.5nM in order to occupy half the binding sites. In enzyme inhibition assays, the concentration of the enzyme limits sets an upper limit on the potency that you can measure and this may be an issue for attempts to estimate the maximum potency of ligands.
In a heterogeneous system, things are not quite as simple because concentration is less clearly defined and you need to think in terms of quantities (in molar terms of course) of protein and ligand. Localising a small amount of protein on the chip surface rather than having a larger amount of protein distributed evenly throughout the sample volume means less depletion of the reservoir of unbound ligand when 50% of binding sites become occupied. Also in the SPR assay, the solution of ligand flows over the chip, making depletion of unbound ligand even less of a problem.
Tight binding is not usually a problem when screening fragments and the main reason for bringing up the subject was to get you thinking a bit about assays. There are a number of technologies for detecting the binding of fragments and quantifying the affinity with which they bind. This raises a couple of questions. Firstly, to what extent do we need new screening technologies for FBDD? Secondly, which weaknesses in the current methodology should be addressed with the highest priority?
Literature cited
Giannetti, From experimental design to validated hits: A comprehensive walk-through of fragment lead identification using surface plasmon resonance. Methods Enzymol. 2012, 493, 169-218. DOI
Saturday, 28 January 2012
Anyone for tennis?

So I'm back in Melbourne, one of my favorite cities, and am currently amusing myself by looking for transition states which some of you will know are merely first order saddle points on potential energy surfaces. I actually find reactivity a lot more interesting than binding so it's great that my hosts are interested in these problems. Of course there's a lot more to do in Melbourne than searching for imaginary forces and negative frequencies and last week I went to watch the Australian Open.
Before I get to the tennis, I'll mention the Research Works Act which can be seen as protectionism (of academic publishers) and readers of this blog will know that I have some issues with journal editors. Well it turns out that protectionism is everywhere even in what you would think would be the fully competitive environment of the Australian Open. Whe you attend the Open, any bag you take in is (quite rightly) inspected and they'll be looking carefully for particularly dangerous items. Like lenses with a focal lengths greater than 200mm. Why are these considered so dangerous, you might ask? Well the danger posed by such contraband is that it might allow its owner to take a really good picture. While the organisers of the Open want you to enjoy the tennis and buy lots of food and drink, the organisers of the Open do not want you to take good pictures. Anyway here's a picture of the official photographers whom the organisers of the Open are 'protecting' from me and my Pentax K200.

The photos are from the match between Maria Sharapova and Angelique Kerber and I'd hoped that it would turn into an epic struggle of Kurskian proportions. As it transpired, Kerber's Panzerfausts were no match for Sharapova's Katyusha batteries. I couldn't help wondering if Maria had modelled her famous 'vocals' on the unique music of Stalin's Organ.

The match was not without its lighter moments and Maria aborted one serve to entertain us with a dance move from Swan Lake that would not have disgraced the Bolshoi. However, it was soon clear that poor Angelique was not having a good day in the office and, after she bashed yet another ball into the net, I was not going to contradict her. On the bright side, at least she didn't try to blame the Romanians or start raving about Steiner's divisions.

As had always seemed probable, the match concluded in two sets and at the post-match interview, Maria demonstrated her familiarity with the principles and practise of High-Throughput Screening.
Thursday, 3 November 2011
Rule of 3 takes some flak
You might remember my rantlet about the Rule of Three (RO3) at the beginning of the year. Well it seems that others consider Ro3 to be over-restrictive and this JMC article will be as welcome in some parts of Cambridge as a staffel of Ju87s.
The authors of this work describe screening of a library of 364 fragments against the aspartyl protease endothiapepsin and crystal structures of 11 hits bound to the target protein. The library was designed “without strictly applying the rule of 3” and, as it turns out, “only 4 of the 11 fragments are consistent with the rule of 3”. Not exactly a ringing endorsement for RO3 or a compelling incentive to buy a RO3-compliant fragment library.
Hopefully one point that you’ll have taken away from my earlier post is that those who gave us RO3 don’t say a whole lot about how they define hydrogen bond donors and acceptors so it can be difficult to say whether some fragments are RO3-complaint or not. I’m guessing that RO3’s proposers may not be be using the same definitions that are used to apply the Rule of 5 (RO5). However, I really don’t know and don’t really care.
Now you can see the problem. Do any of the 7 fragment hits (I refuse to call them frits since that term is more usually associated with ceramics than Drug Discovery and an association with ceramics is something that you’ll want to avoid for fragments) from the endothiapepsin screen that are reported to be inconsistent with RO3 actually fail to comply with the rule? Let’s take a look at the structures of two fragments for which binding to target was observed crystallographically. You’ll notice that I’ve retained the structure numbering from the article.

There are 2 nitrogens and 3 oxygens in 041 so with RO5 hydrogen bonding definitions this fragment would not be RO3-compliant. Personally, I wouldn’t count the amide nitrogen as an acceptor but that’s my choice. The cyclic ether oxygens in 041 will be very weak hydrogen bond acceptors and even three or four of these together will pack less of a punch than the typical amide carbonyl oxygen. I’d actually be much more worried about the reactivity of the aniline portion of this molecule but this is not the place for that discussion. Fragment 255 has 3 nitrogens and one oxygen which translates into 4 hydrogen bond acceptors if you use RO5 definitions. However, I would not count either the amide nitrogen or the bridging nitrogen of the fused heteroaromatic ring as acceptors so with my definitions the fragment would be RO3-compliant.
The assay and crystallisation were carried out at a pH of 4.6. This means that the heteroaromatic nitrogens of 255 and 291 are likely to be protonated to a significant extent under experimental conditions. It’s interesting that the solubility measurements were run at a pH of 7.4 because basic fragments such as 041, 255 and 291 should be even more soluble in assay and crystallisation buffers. It would be a different story for acids but I didn’t see any of those so I guess no harm done.
It’s good to see the output of a fragment screen being published in this manner and the crystal structures for a number of fragments bound to this target represent a welcome addition to the protein knowledge base. Given that I’ve never been a fan of RO3, I do like to see others questioning the rule although reading this paper gives the impression that RO3 has never before been questioned. I also believe that they could have addressed the issue of hydrogen bonding definitions rather than simply jumping to the conclusion that RO5 definitions (all nitrogens and oxygens are acceptors) were being used by those who gave us RO3.
On a final note, you might wonder why I keep banging on about RO3 when it’s something that I’ve never used to select fragments. It’s a good question and this is a good place to answer it. My own view is that the way many researchers have blindly adopted the rule is merely a symptom of a much bigger malaise in Drug Discovery research. Pharma appears to be dans la merde but the response of its leaders is typically to increase the frequency with which the Titanic’s complement of deck chairs is shuffled. Is this really a good time for those who represent the best chance of a future for the industry to be switching off their critical thinking skills?
Literature cited
Köster et al, A Small Nonrule of 3 Compatible Fragment Library Provides High Hit Rate of Endothiapepsin Crystal Structures with Various Fragment Chemotypes. J. Med. Chem. 2011, in press. DOI
The authors of this work describe screening of a library of 364 fragments against the aspartyl protease endothiapepsin and crystal structures of 11 hits bound to the target protein. The library was designed “without strictly applying the rule of 3” and, as it turns out, “only 4 of the 11 fragments are consistent with the rule of 3”. Not exactly a ringing endorsement for RO3 or a compelling incentive to buy a RO3-compliant fragment library.
Hopefully one point that you’ll have taken away from my earlier post is that those who gave us RO3 don’t say a whole lot about how they define hydrogen bond donors and acceptors so it can be difficult to say whether some fragments are RO3-complaint or not. I’m guessing that RO3’s proposers may not be be using the same definitions that are used to apply the Rule of 5 (RO5). However, I really don’t know and don’t really care.
Now you can see the problem. Do any of the 7 fragment hits (I refuse to call them frits since that term is more usually associated with ceramics than Drug Discovery and an association with ceramics is something that you’ll want to avoid for fragments) from the endothiapepsin screen that are reported to be inconsistent with RO3 actually fail to comply with the rule? Let’s take a look at the structures of two fragments for which binding to target was observed crystallographically. You’ll notice that I’ve retained the structure numbering from the article.

There are 2 nitrogens and 3 oxygens in 041 so with RO5 hydrogen bonding definitions this fragment would not be RO3-compliant. Personally, I wouldn’t count the amide nitrogen as an acceptor but that’s my choice. The cyclic ether oxygens in 041 will be very weak hydrogen bond acceptors and even three or four of these together will pack less of a punch than the typical amide carbonyl oxygen. I’d actually be much more worried about the reactivity of the aniline portion of this molecule but this is not the place for that discussion. Fragment 255 has 3 nitrogens and one oxygen which translates into 4 hydrogen bond acceptors if you use RO5 definitions. However, I would not count either the amide nitrogen or the bridging nitrogen of the fused heteroaromatic ring as acceptors so with my definitions the fragment would be RO3-compliant.
The assay and crystallisation were carried out at a pH of 4.6. This means that the heteroaromatic nitrogens of 255 and 291 are likely to be protonated to a significant extent under experimental conditions. It’s interesting that the solubility measurements were run at a pH of 7.4 because basic fragments such as 041, 255 and 291 should be even more soluble in assay and crystallisation buffers. It would be a different story for acids but I didn’t see any of those so I guess no harm done.
It’s good to see the output of a fragment screen being published in this manner and the crystal structures for a number of fragments bound to this target represent a welcome addition to the protein knowledge base. Given that I’ve never been a fan of RO3, I do like to see others questioning the rule although reading this paper gives the impression that RO3 has never before been questioned. I also believe that they could have addressed the issue of hydrogen bonding definitions rather than simply jumping to the conclusion that RO5 definitions (all nitrogens and oxygens are acceptors) were being used by those who gave us RO3.
On a final note, you might wonder why I keep banging on about RO3 when it’s something that I’ve never used to select fragments. It’s a good question and this is a good place to answer it. My own view is that the way many researchers have blindly adopted the rule is merely a symptom of a much bigger malaise in Drug Discovery research. Pharma appears to be dans la merde but the response of its leaders is typically to increase the frequency with which the Titanic’s complement of deck chairs is shuffled. Is this really a good time for those who represent the best chance of a future for the industry to be switching off their critical thinking skills?
Literature cited
Köster et al, A Small Nonrule of 3 Compatible Fragment Library Provides High Hit Rate of Endothiapepsin Crystal Structures with Various Fragment Chemotypes. J. Med. Chem. 2011, in press. DOI
Monday, 26 September 2011
Dans la merde
I’ve been meaning to write something on the state of Pharma ever since my good friend Anthony Nicholls posted What Is Really Killing Pharma back in April. Ant sees an industry that is rapidly abandoning its science base and he is less than complimentary about Pharma management:
‘One consequence of this shift from science to business in the pharma industry has been less and less appreciation for the realities—as opposed to the hype and hope—of drug discovery. This is reflected both in the quixotic choices made by pharma as to what to pursue and in the stunningly bad management of the core talent in drug discovery.’
Now I don’t happen to fully agree with Ant on the diagnosis and it is possible that the underwhelming management of Pharma is more a symptom of the underlying disease than a disease in its own right. However, I will make some comments on Pharma management before moving onto some of what I see to be the industry's real woes. One brutal assessment of the situation is that, if society has decided that discovery of new medicines is to be a commercial activity, we (as members of society) should not complain when pharmaceutical businesses behave like businesses. I don’t believe that it is actually necessary for a Pharma CEO to understand the science although it’s a bonus if they do. Given that the time to bring a drug from hypothesis to market is longer than the tenure of many CEOs, it’s much more important that they be prepared to take a long term view and understand how the different parts of the business fit (and function) together. The shareholders of the company need to find mechanisms to persuade their CEO take a long term view. The CEO needs to seek advice from people who are prepared to tell the truth and not sideline them when they do. Those managing the drug hunt must avoid becoming panacea-centric in their thinking and remember that, while technology is a good servant, it is a poor master.
We need to take a closer look at the industry to better understand Pharma’s woes. As many people know, bringing a drug to market takes a long time and is also very expensive. The industry is highly regulated and the cost to the regulators of accepting something that they shouldn’t have greatly exceeds that of rejecting something that they should have accepted. Since Pharma companies don’t usually see themselves as in the Generics business, they need a steady stream of new products in order to remain viable. Unfortunately, this stream has slowed to a trickle and it is perfectly reasonable to question whether Drug Discovery is still a commercially viable activity.
It is easy to blame management for the current state of the industry and I’ll be the first to admit that many CEOs appear to be poor value for their employers (the shareholders). However, the current state of pipelines also reflects unmet scientific challenges and one can argue that the frequently bizarre behavior of Pharma leaders reflects increasing desperation in their search for other solutions.
Generally, a Drug Discovery project starts with a hypothesis. Typically, this will take the form that interfering (drugs are usually inhibitory) with the action of a target or system of targets (e.g. a pathway) will result in a therapeutically beneficial effect. Testing these hypotheses is termed Target Validation (TV) and usually one will try to develop an animal model of the human disease before taking a drug into development. Let’s think about why a drug may fail to show efficacy in a Phase 2 clinical trial. One explanation is that there is no link between target and disease (another way of saying that the TV hypothesis is incorrect). However, it could also be that the target is still valid but the animal model is simply not predictive of the human disease. Needless to say, TV is challenging and even reproducing claims made in the scientific literature can be difficult (readers who are LinkedIn may also wish to check out the discussion in the Society of Laboratory Automation and Screening group entitled "Reliability of 'new drug target' claims called into question").
However, there is yet another reason that a drug can fail to show efficacy in Phase 2 and that’s poor pharmacokinetics (PK). Now many of you are probably thinking that I’m talking from the wrong end of my alimentary canal when I say this because everybody knows that Phase 1 is where PK failures happen. You run a Phase 1 clinical trial to check that levels of the drug will be sufficiently high to engage the target and this is most relevant when the target ‘sees’ the blood stream. However, when the target is intracellular or on the far side of the blood brain barrier, we know a lot less about the free (unbound) concentration of drug in the vicinity of the target. Now you’ll see the problem and I’ll leave it to you to decide whether we’re dealing with known unknowns or unknown unknowns. The blood levels look great but we have little idea about what’s happening where it really matters. For intracellular and CNS targets it can be argued that the Phase 1 trial is less complete than for targets such as cell surface receptors that are exposed to the drug circulating in plasma. How much less complete is anybody’s guess because measuring free concentrations of an arbitrary drug in cells is just not something that we can currently do, even in laboratory animals.
This is probably a good time to bring up the subject of toxicity and it’s worth mentioning that the point made about free concentration is also relevant to toxicity (and ‘polypharmacology’). Pretty much the worst thing that can happen to a drug is that idiosyncratic toxicity reveals itself when the drug is already on the market. Rare toxicity is fiendishly difficult to predict and its rareness means that you have to dose a large number of patients (who may also be taking other medication) in order to even observe it. The rareness of the toxicity means that the enrichment studies that are so popular with the virtual screening and QSAR communities are unlikely to shed much light on the toxicity. Choking in Phase 3 is certainly bad but you can always console yourself with the knowledge that it could have been even worse.
So what’s really killing Pharma? There’s no shortage of gutless and witless managers in Pharma and there would be huge benefits in ensuring that undiluted Darwinian principles applied freely to the Leadership Function (surely a strong candidate for oxymoron of the month) of the industry. Would this be enough to save Pharma from a dearth of well-validated targets? Or one bust too many in the Phase 3 Casino? What do you think?
Literature cited
Prinz, Schlange & Asadullah, Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 2011, 10, 712-713. DOI
‘One consequence of this shift from science to business in the pharma industry has been less and less appreciation for the realities—as opposed to the hype and hope—of drug discovery. This is reflected both in the quixotic choices made by pharma as to what to pursue and in the stunningly bad management of the core talent in drug discovery.’
Now I don’t happen to fully agree with Ant on the diagnosis and it is possible that the underwhelming management of Pharma is more a symptom of the underlying disease than a disease in its own right. However, I will make some comments on Pharma management before moving onto some of what I see to be the industry's real woes. One brutal assessment of the situation is that, if society has decided that discovery of new medicines is to be a commercial activity, we (as members of society) should not complain when pharmaceutical businesses behave like businesses. I don’t believe that it is actually necessary for a Pharma CEO to understand the science although it’s a bonus if they do. Given that the time to bring a drug from hypothesis to market is longer than the tenure of many CEOs, it’s much more important that they be prepared to take a long term view and understand how the different parts of the business fit (and function) together. The shareholders of the company need to find mechanisms to persuade their CEO take a long term view. The CEO needs to seek advice from people who are prepared to tell the truth and not sideline them when they do. Those managing the drug hunt must avoid becoming panacea-centric in their thinking and remember that, while technology is a good servant, it is a poor master.
We need to take a closer look at the industry to better understand Pharma’s woes. As many people know, bringing a drug to market takes a long time and is also very expensive. The industry is highly regulated and the cost to the regulators of accepting something that they shouldn’t have greatly exceeds that of rejecting something that they should have accepted. Since Pharma companies don’t usually see themselves as in the Generics business, they need a steady stream of new products in order to remain viable. Unfortunately, this stream has slowed to a trickle and it is perfectly reasonable to question whether Drug Discovery is still a commercially viable activity.
It is easy to blame management for the current state of the industry and I’ll be the first to admit that many CEOs appear to be poor value for their employers (the shareholders). However, the current state of pipelines also reflects unmet scientific challenges and one can argue that the frequently bizarre behavior of Pharma leaders reflects increasing desperation in their search for other solutions.
Generally, a Drug Discovery project starts with a hypothesis. Typically, this will take the form that interfering (drugs are usually inhibitory) with the action of a target or system of targets (e.g. a pathway) will result in a therapeutically beneficial effect. Testing these hypotheses is termed Target Validation (TV) and usually one will try to develop an animal model of the human disease before taking a drug into development. Let’s think about why a drug may fail to show efficacy in a Phase 2 clinical trial. One explanation is that there is no link between target and disease (another way of saying that the TV hypothesis is incorrect). However, it could also be that the target is still valid but the animal model is simply not predictive of the human disease. Needless to say, TV is challenging and even reproducing claims made in the scientific literature can be difficult (readers who are LinkedIn may also wish to check out the discussion in the Society of Laboratory Automation and Screening group entitled "Reliability of 'new drug target' claims called into question").
However, there is yet another reason that a drug can fail to show efficacy in Phase 2 and that’s poor pharmacokinetics (PK). Now many of you are probably thinking that I’m talking from the wrong end of my alimentary canal when I say this because everybody knows that Phase 1 is where PK failures happen. You run a Phase 1 clinical trial to check that levels of the drug will be sufficiently high to engage the target and this is most relevant when the target ‘sees’ the blood stream. However, when the target is intracellular or on the far side of the blood brain barrier, we know a lot less about the free (unbound) concentration of drug in the vicinity of the target. Now you’ll see the problem and I’ll leave it to you to decide whether we’re dealing with known unknowns or unknown unknowns. The blood levels look great but we have little idea about what’s happening where it really matters. For intracellular and CNS targets it can be argued that the Phase 1 trial is less complete than for targets such as cell surface receptors that are exposed to the drug circulating in plasma. How much less complete is anybody’s guess because measuring free concentrations of an arbitrary drug in cells is just not something that we can currently do, even in laboratory animals.
This is probably a good time to bring up the subject of toxicity and it’s worth mentioning that the point made about free concentration is also relevant to toxicity (and ‘polypharmacology’). Pretty much the worst thing that can happen to a drug is that idiosyncratic toxicity reveals itself when the drug is already on the market. Rare toxicity is fiendishly difficult to predict and its rareness means that you have to dose a large number of patients (who may also be taking other medication) in order to even observe it. The rareness of the toxicity means that the enrichment studies that are so popular with the virtual screening and QSAR communities are unlikely to shed much light on the toxicity. Choking in Phase 3 is certainly bad but you can always console yourself with the knowledge that it could have been even worse.
So what’s really killing Pharma? There’s no shortage of gutless and witless managers in Pharma and there would be huge benefits in ensuring that undiluted Darwinian principles applied freely to the Leadership Function (surely a strong candidate for oxymoron of the month) of the industry. Would this be enough to save Pharma from a dearth of well-validated targets? Or one bust too many in the Phase 3 Casino? What do you think?
Literature cited
Prinz, Schlange & Asadullah, Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 2011, 10, 712-713. DOI
Thursday, 15 September 2011
A SMARTS way to do things?
A couple of months ago I returned from a visit to OpenEye in Santa Fe, New Mexico. I’d been helping out with tautomers and ionisation and it really was great to be back in one of my favourite States of the Union catching up with some old friends while making some new ones. However, it’s neither tautomers nor ionisation that I’ll be discussing in this post because I really want to talk about SMARTS. This is a line notation for defining substructural queries and a SMARTS parser with full capability is one of the most powerful weapons in the molecular design arsenal. One of the things that I did in Santa Fe was to learn a bit about using the OpenEye SMARTS parser. I like to think of SMARTS as empowering in that a SMARTS parser allows me to impose my will on a database of chemical structures. This really brings out my latent megalomaniac and makes me want to gaze at large wall-mounted maps of the world…
SMARTS notation is actually very simple but at the same time is highly expressive. It’s best illustrated using some examples. Let’s start with a simple definition for a neutral carboxylic acid and I’ve kept things simple by not requiring a connection between the carbon and another carbon atom.
[OH]C=O
When dealing with commercially available collections of compounds, the carboxylic acids may be registered both in neutral and anionic (salt) forms. Although people in Pharma may whinge about, this one has to remember that a compound vendor needs to distinguish benzoic acid from sodium benzoate and I have no time for lily-livered whingers. As Marie Antoinette might have said, “Let them eat SMARTS”. Here are a couple of SMARTS queries that will match either neutral or anionic forms of carboxylic acids. [O;H,-] specifies an oxygen atom that either has a single hydrogen or a negative charge while [OD1] specifies an oxygen atom with a single non-hydrogen connection.
[O;H,-]C=O
[OD1]C=O
A SMARTS parser with full capability will not only match the substructural pattern but will also map individual atoms. This is really useful for atom typing and remember that you can get a lot of information (e.g. ionisation, interaction potential) about an atom from its connectivity. In a pharmacophore search I would want to treat both oxygen atoms of the carboxylic acid as anionic and might do this using recursive SMARTS as follows.
[$([OH]C=O),$(O=C[OH])
One of my favourite features of SMARTS is the vector binding which associates a SMARTS pattern with a label and allows you to create patterns that are much more human-readable. This is really important when creating a view of chemistry that is to be imposed on chemical databases. I’ll show how you can build a simple definition of aliphatic amines (remember that these usually protonate under normal physiological conditions) using vector bindings. First let’s define a carbon with four connections.
Csp3 [CX4]
Now we’ll use this to define primary, secondary and tertiary amines which we’ll then combine into a single all aliphatic amine definition. Notice how I ‘over-specify’ the nitrogen connectivity in order to prevent matching against amine oxides, protonated amines and quaternary ammonium.
PriAmin [N;H2;X3][$Csp3]
SecAmin [N;H;X3]([$Csp3])[$Csp3]
TerAmin [NX3]([$Csp3])([$Csp3])[$Csp3]
AllAmin [$PriAmin,$SecAmin,$TerAmin]
So that finishes our quick introduction to SMARTS notation. In my own work, I’ve used SMARTS not only to locate structural features in molecules but also to modify the molecules, for example to set ionisation states in a database of structures to be docked into the binding site of a protein. Being able to modify structures automatically and in a controlled manner also makes it possible to do cool stuff like identify matched molecular pairs ( mmp1 | mmp2 ). I should mention that there is a SMARTS-like notation called SMIRKS for modifying structures although I’m not going to say anything about it right now.
There’s plenty of information about SMARTS out there, including a Wikipedia page and the Daylight SMARTS Theory Manual, Tutorial and Examples. The Daylight and OpenEye SMARTS parsers are provided as tool kits (so you can build your own software) and both support recursive SMARTS and vector bindings (not all SMARTS parsers do this so check with your software vendor). I started with the Daylight product back in 1995 and taught myself some C in order to use it. However, the OpenEye SMARTS parser can also be used with 3D structures and I’m looking forward to doing lots more with it.
I’ll finish with some comments on terminology. A substructural definition written in SMARTS notation can be called a SMARTS pattern, a SMARTS string or even a SMARTS. Whatever you do, don’t call it a SMART (you wouldn’t talk about a specie in relation to living organisms) because that will make you look half-witted (and make me cringe). Also to talk about a SMILE or a SMIRK would be equally crass so don’t say I didn’t warn you.
Literature cited
Kenny & Sadowski Structure Modification in Chemical Databases, Methods and Principles of Medicinal Chemistry 2005, 23, 271-285 | DOI
Leach et al Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure J. Med. Chem. 2006, 49, 6672–6682 | DOI
Birch et al, Matched molecular pair analysis of activity and properties of glycogen phosphorylase inhibitors. Bioorg Med Chem Lett 2009, 19, 850-853 | DOI
SMARTS notation is actually very simple but at the same time is highly expressive. It’s best illustrated using some examples. Let’s start with a simple definition for a neutral carboxylic acid and I’ve kept things simple by not requiring a connection between the carbon and another carbon atom.
[OH]C=O
When dealing with commercially available collections of compounds, the carboxylic acids may be registered both in neutral and anionic (salt) forms. Although people in Pharma may whinge about, this one has to remember that a compound vendor needs to distinguish benzoic acid from sodium benzoate and I have no time for lily-livered whingers. As Marie Antoinette might have said, “Let them eat SMARTS”. Here are a couple of SMARTS queries that will match either neutral or anionic forms of carboxylic acids. [O;H,-] specifies an oxygen atom that either has a single hydrogen or a negative charge while [OD1] specifies an oxygen atom with a single non-hydrogen connection.
[O;H,-]C=O
[OD1]C=O
A SMARTS parser with full capability will not only match the substructural pattern but will also map individual atoms. This is really useful for atom typing and remember that you can get a lot of information (e.g. ionisation, interaction potential) about an atom from its connectivity. In a pharmacophore search I would want to treat both oxygen atoms of the carboxylic acid as anionic and might do this using recursive SMARTS as follows.
[$([OH]C=O),$(O=C[OH])
One of my favourite features of SMARTS is the vector binding which associates a SMARTS pattern with a label and allows you to create patterns that are much more human-readable. This is really important when creating a view of chemistry that is to be imposed on chemical databases. I’ll show how you can build a simple definition of aliphatic amines (remember that these usually protonate under normal physiological conditions) using vector bindings. First let’s define a carbon with four connections.
Csp3 [CX4]
Now we’ll use this to define primary, secondary and tertiary amines which we’ll then combine into a single all aliphatic amine definition. Notice how I ‘over-specify’ the nitrogen connectivity in order to prevent matching against amine oxides, protonated amines and quaternary ammonium.
PriAmin [N;H2;X3][$Csp3]
SecAmin [N;H;X3]([$Csp3])[$Csp3]
TerAmin [NX3]([$Csp3])([$Csp3])[$Csp3]
AllAmin [$PriAmin,$SecAmin,$TerAmin]
So that finishes our quick introduction to SMARTS notation. In my own work, I’ve used SMARTS not only to locate structural features in molecules but also to modify the molecules, for example to set ionisation states in a database of structures to be docked into the binding site of a protein. Being able to modify structures automatically and in a controlled manner also makes it possible to do cool stuff like identify matched molecular pairs ( mmp1 | mmp2 ). I should mention that there is a SMARTS-like notation called SMIRKS for modifying structures although I’m not going to say anything about it right now.
There’s plenty of information about SMARTS out there, including a Wikipedia page and the Daylight SMARTS Theory Manual, Tutorial and Examples. The Daylight and OpenEye SMARTS parsers are provided as tool kits (so you can build your own software) and both support recursive SMARTS and vector bindings (not all SMARTS parsers do this so check with your software vendor). I started with the Daylight product back in 1995 and taught myself some C in order to use it. However, the OpenEye SMARTS parser can also be used with 3D structures and I’m looking forward to doing lots more with it.
I’ll finish with some comments on terminology. A substructural definition written in SMARTS notation can be called a SMARTS pattern, a SMARTS string or even a SMARTS. Whatever you do, don’t call it a SMART (you wouldn’t talk about a specie in relation to living organisms) because that will make you look half-witted (and make me cringe). Also to talk about a SMILE or a SMIRK would be equally crass so don’t say I didn’t warn you.
Literature cited
Kenny & Sadowski Structure Modification in Chemical Databases, Methods and Principles of Medicinal Chemistry 2005, 23, 271-285 | DOI
Leach et al Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure J. Med. Chem. 2006, 49, 6672–6682 | DOI
Birch et al, Matched molecular pair analysis of activity and properties of glycogen phosphorylase inhibitors. Bioorg Med Chem Lett 2009, 19, 850-853 | DOI
Wednesday, 29 June 2011
Lipophilicity teaser
This post got prompted one by Dan at Practical Fragments and I'm going to ask you to first take a look at that and at the comments. Now I'd like you to look at some measured octanol/water logP values that I pulled from the Sangster Research Laboratories logPow database. The question I'd like to put to you is whether you think that these measured logP values truly reflect the energetic costs of moving the different isomeric methylimidazoles from water to a truly non-polar environment like a hydrophobic binding pocket in a protein.

Let's take a look these figures. The least lipophilic compound of the set is N-methylimidazole in which the hydrogen bond donor of imidazole has been capped, although the partition coefficients for the three compounds are all very similar. It seems that the octanol/water partitioning system just doesn't seem to 'see' the hydrogen bond donors of the 2-methyl and 4/5-methyl isomers.
Octanol has a hydroxyl group and, in the context of a shake-flask logP determination, gets pretty wet (~2M), making it a unconvincing model for the hydrophobic core of a lipid bilayer or a hydrophobic binding pocket. In contrast, alkanes lack hydrogen bonding capability which also means that they also dissolve less water. The catch is that alkane/water partition coefficients are more difficult to measure than their octanol/water equivalents since polar solutes are poorly soluble in alkane solvents.
The difference between octanol/water and alkane/water logP values for a compound (often termed ΔlogP) is one measure of the polarity of the compound. The octanol/water logP of phenol is 1.5 and it would be reasonable to describe it as lipophilic. However in the alkane/water system the situation is reversed and the logP of -0.6 would lead to phenol being described as hydrophilic.
I'll leave things here for now because this post is really just a teaser and I will be returning to the theme in more depth in the future. If you're interested in finding out more take a look at my harangue from the March 2011 PhysChem Forum at Syngenta and the article that goes with it. I'd also recommend reading this review by Wolfenden if you're interested in the relevance of alkane/water logP values to protein structure and function.
Literature cited
Toulmin, Kenny & Wood, Toward prediction of alkane/water partition coefficients. J. Med. Chem. 2008, 51, 3720-3730. DOI
Wolfenden, Experimental Measures of Amino Acid Hydrophobicity and the Topology of Transmembrane and Globular Proteins. J. Gen. Physiol. 2007, 129, 357-362. DOI

Let's take a look these figures. The least lipophilic compound of the set is N-methylimidazole in which the hydrogen bond donor of imidazole has been capped, although the partition coefficients for the three compounds are all very similar. It seems that the octanol/water partitioning system just doesn't seem to 'see' the hydrogen bond donors of the 2-methyl and 4/5-methyl isomers.
Octanol has a hydroxyl group and, in the context of a shake-flask logP determination, gets pretty wet (~2M), making it a unconvincing model for the hydrophobic core of a lipid bilayer or a hydrophobic binding pocket. In contrast, alkanes lack hydrogen bonding capability which also means that they also dissolve less water. The catch is that alkane/water partition coefficients are more difficult to measure than their octanol/water equivalents since polar solutes are poorly soluble in alkane solvents.
The difference between octanol/water and alkane/water logP values for a compound (often termed ΔlogP) is one measure of the polarity of the compound. The octanol/water logP of phenol is 1.5 and it would be reasonable to describe it as lipophilic. However in the alkane/water system the situation is reversed and the logP of -0.6 would lead to phenol being described as hydrophilic.
I'll leave things here for now because this post is really just a teaser and I will be returning to the theme in more depth in the future. If you're interested in finding out more take a look at my harangue from the March 2011 PhysChem Forum at Syngenta and the article that goes with it. I'd also recommend reading this review by Wolfenden if you're interested in the relevance of alkane/water logP values to protein structure and function.
Literature cited
Toulmin, Kenny & Wood, Toward prediction of alkane/water partition coefficients. J. Med. Chem. 2008, 51, 3720-3730. DOI
Wolfenden, Experimental Measures of Amino Acid Hydrophobicity and the Topology of Transmembrane and Globular Proteins. J. Gen. Physiol. 2007, 129, 357-362. DOI
Subscribe to:
Posts (Atom)