Friday, 26 April 2013

Thermodynamics and molecular interactions

So it’s #RealTimeChem week on twitter and I thought I’d get into the spirit with a blog post.  The article that I’ve selected for review focuses on non-additivity of functional group contributions to affinity.  The protein in question is Thrombin and ligand binding was characterised using protein crystallography and isothermal titration calorimetry.  

Before reviewing the article, it’s probably a good idea to articulate my position on the thermodynamics of ligand-protein binding.  Firstly, G, H and S are three state functions, each of which can be written in terms of the other two, but only one of which is directly relevant to the binding of ligands to proteins.  Kd is no less thermodynamic than ΔH° or ΔS° and the contribution of a particular intermolecular contact to ΔG° (or ΔH° for that matter) is not in general an experimental observable.  Thermodynamics with state functions is like accountancy in that if you over-pay one interaction, the other interactions will lose out.

Now back to the featured article.  One observation presented as evidence for non-additivity is that the slopes of plots of ΔG° against hydrophobic contact area differ according to whether X (see figure below) is H or NH2.  Inhibitors with the amino group bind with greater affinity than the corresponding inhibitors lacking the amino group which interacts (in its cationic form) with the protein.   However, the difference is not constant and actually increases with hydrophobic contact area.   I would certainly agree that something interesting is going on and if we’re ever going to understand ligand-protein then combining affinity measurements for structurally-related ligands with structural information will be very useful.   
 
About enthalpy measurements, I am a lot less sure.  Isothermal titration calorimetry (ITC) is a direct, label-free method for measuring affinity but the fact that you get ΔH° from the experiment at no extra cost doesn’t by itself make ΔH° useful.     If measured values of ΔH° lead to improved predictions of ΔG° or provide clear insight into the nature of the interactions between ligand and protein then I certainly agree that we should make use of ΔH°.  However, there is still the problem that there is no unique way to distribute ΔG°  (or, for that matter, ΔH°) over intermolecular contacts and biomolecular recognition also takes place in aqueous media.    The cohesiveness of liquid water that drives hydrophobic association in aqueous media is a consequence of strong, cooperative hydrogen bonds between water molecules and I like to think of the hydrophobic force as a non-local, indirect, electrostatic interaction.  This non-local nature of hydrophobic interactions complicates interpretation of affinity measurements in structural terms.

Let's see what the authors have to say:

"Analysis of the individual crystal structures and factorizing the free energy into enthalpy and entropy demonstrates that the binding affinity of the ligands results from a mixture of enthalpic contributions from hydrogen bonding and hydrophobic contacts, and entropic considerations involving an increasing loss of residual mobility of the bound ligands."

I'm going to put my cards on the table and say that I believe this statement represents an exercise in arm waving.    See what you think and ask yourself the question as to whether this statement would help you design a higher affinity Thrombin inhibitor?  It's also worth thinking carefully about the relationship between mobility and entropy.   One way of looking at entropy is as the degree to which systems are constrained and a more highly constrained system will be less mobile.    The authors state:

"The present study shows, by use of crystal structure analysis and isothermal titration calorimetry for a congeneric series of thrombin inhibitors, that extensive cooperative effects between hydrophobic contacts and hydrogen bond formation are intimately coupled via dynamic properties of the formed complexes."

I think that one needs to be very careful when talking about 'dynamic properties' in the context of equilibrium thermodynamics.  Ultimately entropy is determined by the characteristics of potential energy surfaces and Statistical Mechanics tells us that entropy (and other thermodynamic properties) can be calculated from the partition function.  It may be instructive to think how one would use the partition function to put these 'dynamic properties' on a quantitative basis.

I'm now going to change direction because there is one factor that the authors appear not to have considered  and I think that it could be quite important.  The ligand amino group (see figure above; it's protonated under assay conditions) interacts with the protein but it can also affect affinity in another way.  Have a look a this figure (showing the stricture of complex with 3e) from the article which shows how one of the inhibitors lacking the amino group binds.  Now I'd like you to look at one of the dihedral angles in the ligand structure and you can see how it is defined by looking at the substructures inset in the histograms below.  
 
The histograms show the distributions of (the absolute value of) this dihedral angle observed for the instances of substructure in the CSD. The histogram on the left suggests that the dihedral angle will tend to be 180° when the carbon next to the carbonyl carbon has two attached hydrogens. The histogram on the right shows how a substituent (X ¹ H) on that carbon shifts the distribution of dihedral angles.  Now let's go back to the structure of complex 3e, which can be downloaded from the PDB as refcode 2ZIQ, and we can see that the relevant dihedral angle is 117°.  Comparing the two histograms tells us is that a substituent (such as an amino group) on the carbon next to the carbonyl group will tend to stabilise the bound conformations of these inhibitors in addition to making direct interactions with the protein.  Do the ITC results tell us this?       
Literature cited
Baum, Muley, Smolinski, Heine, Hangauer and Klebe (2010) Non-additivity of functional group contributions by protein-ligand binding: A comprehensive study by crystallography and isothermal titration calorimetry. J Mol Biol 397:1042-1054  doi

Tuesday, 9 April 2013

Unknown knowns and binding kinetics

I hope that you all enjoyed the April Fool post but now it’s time to get back to business.  Before returning to Thermodynamics, I’d like to point you towards a couple of articles that you might find to be of interest.  In the first of these, Ben Davis and Practical Fragments’ Dan Erlanson take a look at some of what can go wrong when you screen fragments and it is noteworthy that they actively sought material from the fragment community before writing the review.  If you’re thinking about running a fragment screen, you do need to read this article which has already been featured in other blog posts ( 1 | 2 | 3).    

The article by Göran Dahl and Tomas Åkerud focuses on binding kinetics, specifically in the context of pharmacokinetics. In multistep processes, timescale determines relevance.  My take is that slow binding is equivalent to slow distribution and, at some point when the topic is raised, I will usually get round to asking whether slow distribution is something that you would want to design into a drug. The authors conclude that ‘the drug-target residence time appears to be limited in scope for prolonging duration of effect’ and the article provides a timely reminder that drugs act within a pharmacokinetic framework.
Literature cited
Davis & Erlanson (2013) Learning from our mistakes: The ‘unknown knowns’ in fragment screening. Bioorg Med Chem Lett In press DOI

Dahl & Åkerud (2013) Pharmacokinetics and the drug-target residence time concept. Drug Discov Today In press DOI

Monday, 1 April 2013

Isotopic enrichment by recrystallisation from water

Isotopic enrichment doesn’t normally fall under the molecular design remit.  However a radically new approach has been recently discovered and predictive modelling was very important in showing that the method would indeed be feasible.  Isotopic enrichment is typically carried out in gas phase and if the elemental form is involatile it is necessary to prepare a compound which can be more readily persuaded into the gas phase. So if you want to enrich uranium you first make the relatively volatile hexafluoride (fluorine is good because it’s only got one natural isotope) and then stick it in a centrifuge and away you go.

The isotopic enrichment industry has been watching developments in aqueous solubility prediction with great interest.  As you’ll know aqueous solubility is a very important physicochemical property and a key determinant of the efficiency with which a drug is absorbed from the gastrointestinal tract.  By focussing on aqueous solubility you can get away from all the technological demands of working in gas phase.  You can do this in your garage.

Most pharmaceutical researchers are familiar with the idea that aqueous solublility decreases with molecular weight and there is large literature of many seminal articles on the subject.  One consequence of this anti-correlation is that the average solubility for compounds above a particular molecular weight cut off will always be lower than the average solubility for the compounds with molecular weights below the cut off.  The selection of the optimal cut off is a challenging problem and it was believed that the only option available was to design compounds such that the masses of two isotopic forms straddled the value of the molecular weight cut off.  Only recently has the seminal 1940 study of Bronshtein and Mercader, the result of a short collaboration in Mexico City, come to light.  In a nutshell, their study unequivocally demonstrated that the cut off could be set to the centroid of the isotopic masses for any compound.  The math in the article is truly formidable but the essence of the method is that diagonalisation of the reduced mass tensor is NKVD-complete.  Thus the cut off can be matched to compounds which is much easier than having to design compounds to the bracket the cut off. 

Literature cited
Bronshtein and Mercader Мисс Скарлетт с ледорубом в исследовании.  Dokl Akad Nauk 1940, 7, 432-456 doi

Thursday, 28 March 2013

Efficient Voodoo Thermodynamics

So you’ve got a little data analysis problem.  You have some compounds with a range of IC50 values and you’d like to explore the extent that molecular size contributes to potency/affinity.  Here’s one suggestion for starters.  Plot pIC50 against your favourite measure of  molecular size which could be number of number of non-hydrogen atoms, molecular weight and look at the residuals which tell you how much each compound beats the trend (or is beaten by it).  Let’s start by defining  pIC50 which you can calculate from:

 pIC50 = -log(IC50/M)                                                                              1
You might be asking why I’m not suggesting that you use the standard Gibbs free energy of binding which is defined by:

ΔG° = RTln(Kd/C°)                                                                                 2
The main reason for not doing this is that we can’t.  You’ll notice that ΔG° is calculated from Kd and not  IC50 and the two are not the same thing even though they both have units of concentration.  Those of you who have worked on kinase projects may have even used IC50 values measured at different ATP concentrations to get a better idea how much kick your inhibitors will have at physiological ATP concentration.  Put another way, you can measure the concentration of sugar in your coffee and plug this into equation 2 but that does not make what you calculate a standard Gibbs free energy of binding.   If you’ve measured IC50 then you really should use  pIC50 in this analysis.  Converting pIC50 to ΔG° is technically incorrect and arguably pretentious since the converted pIC50 can give the impression that it is somehow more thermodynamic than that from which it was calculated.  Converting pIC50  to ΔG° also introduces additional units (of energy/mole) and there is always a degree of irony when these units, which may have been introduced to just make biological data look more physical, get lost when the results are presented.

So let’s get back to the data analysis problem.  Suppose that we’ve plotted  pIC50 against number of heavy (i.e. non-hydrogen) atoms and the next step is to fit the data.  Best way to start is to fit a straight line although you could also fit a curve if the data justifies this.  Let’s assume that we’re fitting the straight line:
  pIC50 = A + B×NHA                                                                            3

I realise that using an intercept term (A) will cause a few eyebrows to become raised.  Surely the line of fit should go through the origin?   There is a problem with this line of thinking and it’s helpful now to talk instead in terms of affinity and ΔG° to develop the point a bit more.  You might be thinking that in the limit of zero molecular size a compound should have zero free energy of binding.  However, a zero free energy of binding corresponds to Kd being equal to the standard concentration and you’ll remember that the choice of standard state is arbitrary.  If you must derive insights from thermodynamic measurements then the very least that you can do is to ensure that any insights you derive are invariant with respect to the value of the standard concentration.  

When you use ligand efficiency (-ΔG°/NHA) you’re effectively assuming that the value of ΔG° should be directly proportional to the number of heavy atoms in the ligand molecule.  One consequence of defining ligand efficiency in this manner is that relative values of ligand efficiency for compounds with different numbers of heavy atoms will change if you change (as thermodynamics tells us that we are allowed to do) the standard concentration used to define ΔG°.  I've droned on enough though and it's time to check out.  I will however, leave you with the question of whether it makes sense to try to correct ligand efficiency for the effects of molecular size. 

Sunday, 17 March 2013

The wrong kind of free energy

It has been some time since I last blogged.   There has been the distraction of a little drug-likeness project and there is still much to do before I leave Brasil at the end of May.  I have, however, set up a twitter account (@pwk2013) which I use in futile attempts to engage minor celebrities (like God and Richard Dawkins) in conversation.   I’ll be taking a look at some aspects of thermodynamics in this post and I’ll start by writing the relationship between dissociation constant and the standard free energy of binding:

ΔG° = RTln(Kd/C°)

I realise that this might look a bit different to what you’re used to so I’ll try to explain why its been written like this.  First of all Kd has units of concentration and logarithms are only defined for numbers (i.e. unitless quantities) so you might want to consider the possibility that what you normally write is wrong (apologies for hideous pun).   The other thing that you need to remember is that the standard free energy of binding depends on the choice of standard state and writing the equation like this makes this connection explicit.   ΔG° is negative for a 10nM compound when the standard concentration is 1M.  However, if we were to define the standard concentration to be 1nM then   ΔG° would be positive.  If you consider the idea of a 1nM standard concentration to be offensive, think of a 1M solution of your favourite protein...
One of the misconceptions that drug discovery researchers tend to have is that you can’t call it thermodynamics unless you measure both enthalpy and entropy.  One reason for this is the use of isothermal titration calorimetry (ITC) to measure affinity.  ITC is a direct, label-free method for measuring affinity and you get the enthalpy as part of the package.  One of the important things to remember about thermodynamics is that you need to use the most appropriate thermodynamic quantity to describe the phenomenon that you’re interested in and for binding at constant pressure this is the Gibbs free energy.  In contrast, the engineer scaling up a synthetic reaction for manufacturing a drug will usually be more interested in the heat and volume (i.e. gaseous bi-products) generated by the reaction because failure to control these can result in the Big Kaboom.

One idea that has emerged as ITC has become more accessible in drug discovery is that enthalpy-driven binding is somehow better than entropy-driven binding.  I remain sceptical and ask the rhetorical question of how an isothermal system such as a human taking a drug senses the degree to which engagement of the drug’s target(s) is driven by enthalpy changes.   If measuring enthalpy of binding in addition to affinity helps us to predict affinity for compounds yet to be synthesised or gives us clear (i.e. keeping arms stationary) insights into the nature of binding then it makes sense to do it.  However, I’m not seeing this being done convincingly and sometimes wonder whether enthalpy optimisation is just a case of the wrong kind of snow...

We tend to interpret affinity in terms of contacts between protein and ligand although one must always remember that the contribution of a particular contact to affinity is not in general an experimental observable.   Proteins and their ligands associate in an aqueous environment and the non-local nature of the hydrophobic effect further complicates attempts to use structure to rationalise affinity.  I often hear hydrophobic interactions being described as non-directional and in my view this is complete bollocks since interactions are forces and forces are vectors which by definition have direction.  On this subject, I can't resist telling the tale of the Human Resources department getting the Scalar Quantity Award for having magnitude but no direction...
I think this is a good place to leave things and I'll try to be back in a week or some more specific stuff in a follow up to this post. 

Thursday, 18 October 2012

Screening library size

This post got prompted by a survey of screening library size by Teddy at Practical Fragments. I commented briefly there but it soon became clear that it was going to take a really long comment to say everthing. While at AstraZeneca, I was involved in putting together a 20k screening library for fragment-based work and I thought that some people might be interested in reading about some of the experiences. The project started at the beginning of 2005 although the library (GFSL2005) was not assembled until the folllowing year. Some aspects of the library design were also described in our contribution to the JCAMD special issue on FBDD.

The NMR screening libraries with which I've been involved have been a lot smaller (~2k) than this so why would one want to assemble a fragment screening library of this size? One reason was diversity of fragment-based activities, which included a high concentration component when running HTS In screening, you soon learn the importance of logistics and, in particular, compound managment. If you're going to assemble a screening library that can be easily used, this usually means getting the compounds into dimethyl sulfoxide (DMSO) so that samples can be dispensed automatically. Fragments will usually be screened at high concentration which means that stock solutions also need to be more concentrated (we used 100µM) because most assays don't particularly like DMSO. Another reason for maintaining stock solutions is that there are minimum volume limits for dissolving solids in DMSO so starting from solids each time you run a screen is wasteful of material as well as time.

The essence of the GFSL2005 project was ensuring that there were enough high concentration stock solutions in the liquid store to support diverse fragment-based activities on different Discovery sites. While the entire library might be screened as a component of HTS, individual groups could screen subsets of GFSL05 that were significantly smaller (at least when I left AZ in 2009). It's important to rememember that while GFSL05 was a generic library we were also trying to anticipate the needs of people who might be doing directed fragment screening. The compounds were selected using Core and Layer (which I've discussed before). This approach is not specific to fragment screening libraries and I've used it to put together a compound library for phenotypic screening. The in house software, some of which had been developed in response to emergence of HTS, was actually in place by the beginning of 1996. Here (from one of my presentations) is a slide that illustrates the approach.



One of the comments on the Practical Fragments post was about molecular diversity and the anonymous commentator asked:

"Are some companies more keen because of their bigger libraries or just lazy and adding more and more compounds without looking at diversity?"

This is a good challenge and I'll try to give you an idea of what were trying to do. Core and Layer is all about diversity in that the approach aims to drive compound selection away (in terms of molecular similarity) from what has already been selected. However, when designing the library we did try to ensure that as many compounds as possible had a near neighbour (within the library). Here's a slide showing (in pie chart format) the fractions of the library corresponding to each number of neighbours. There are actually three pie charts because counting neighbours depends on the similarity threshold that you use and I've done the analysis for three different thresholds.



You can do a similar analysis to ask about neighbours of library compounds that are available. This is of interest because you'll generally want to be able to follow up hits with analogues in a process that some call 'SAR-by-Catalogue' although I regard the term as silly and refuse to use it. Availability is not constant and the analysis shown in the following slide was a snapshot generated for a 2008. You'll notice that there is an extra row of pie charts since one can define availability more (> 20mg) or less (>10 mg) conservatively. If you require plenty of sample for availability and require a neighbours to be very similar then you'll have less neighbours.



There was also some commentary on solubility and ideally this should be measured in assay bufffer for library compounds. We used of our in house high throughput solubility assay when putting GFSL2005 together. This solubility assay had original been designed for assessing hits from conventional HTS and had a limited dynamic range (1-100 µM) which was not ideal for fragments. The assay used standard (i.e. 10mM) stock solutions which meant that we could only do measurements for samples that were in this format (library compounds acquired from external sources were only made up as 100mM stocks). Nevertheless, we used the assay since we believed that the information would still be valuable. The graphic below illustrates the relatonship between solubility and lipophilicity (ClogP) for compounds that are neutral under assay conditions. We used ClogP to bin the data which allowed us to make use of out of range data by plotting percentiles. This allowed us to assess the risk of poor solubily as a function of ClogP. It's worth pointing out that the binning was done so as to be able to include in range and out of range data in a single analysis. We just showed the lowest percentiles because we were only interested in the least soluble compounds. Even so, you should be able to get an idea of the variation in solubility for each bin.



Staying with the solubility theme I'll finish off taking a look at with a look at the aromatic rings and carbon saturation as determinants of aqueous solubility. The two articles at which I'll be looking were both featured in a post at Practical Fragments and, given that the analyses in the two articles were not exactly rose-scented, the title of the post struck me as somewhat ironic.

I'll start with Escape from Flatland which introduces carbon bond saturation, defined by fraction sp3 (Fsp3) as a molecular descriptor. Figure 5 is the one most relevant to the discussion and this is captioned, "Fsp3 as a function of logS..." although the caption is not totally accurate. LogS is binned and it is actually average Fsp3 that is plotted and not Fsp3 itself. I'm guessing that the average is the mean (rather than the median) Fsp3 for each logS bin. If you look at the plot there appears to be a good correlation between the mid-point of each logS bin and average Fsp3 although it would have been helpful if they'd presented a correlation coefficient and shown a standard deviation for each bin. In fact, it would have been helpful if they'd shown some standard deviations for the other figures as well but that is peripheral to this discussion. The problem with Figure 5 is that the intra-bin variation in Fsp3 is hidden (i.e. no error bars) and without being able to see this variation it is very difficult to know how strongly Fsp3 (as opposed to its average value) is correlated with logS. Those readers who were awake (it was immediately after lunch) at my RACI talk last December will know what I'm talking about but hopefully some of the rest of you will at least be wondering why the authors didn't simply fit logS to Fsp3. Anyone care to speculate as to what the correlation coefficient between logS and Fsp3 might be?

Impact of aromatic ring count presents a box plot of solubility as a function of number of aromatic rings. At least the variation in solubility is shown for each value of aromatic ring count is shown (even though I'd have preferred to see log of solubility plotted). The authors also looked at the correlation between number of aromatic rings cLogP (which I prefer to call ClogP) and judged it to be excellent (although it is not clear whether they bothered to calculate a correlation coefficient to support their assertion of excellence). Correlations between descriptors are important because the effect of one can be largely due to the extent to which is correlated with another. Although there are ways that you can model the dependence of a quantity on two descriptors that are correlated with each other, the authors chose to do this graphically using the pie chart array in Figure 6. If you look at the pie chart array, you can sort of convince yourself that aromatic ring count has an effect on solubility that is not just due to its effect on ClogP. However, there is established statistical methodology for dealing with this sort or problem. I couldn't help wondering why the authors didn't use this to analyse their data. What puzzled me even more was that they didn't seem to have considered the possiblity that the number of aromatic rings might be correlated with molecular weight (Dan also picked up on this in his post) since I'd guess that this correlation might be even stronger than the one with ClogP.

I do believe that a strong case can be made for looking beyond aromatic rings when putting screening libraries and I'll point you to a recent intiative that may be of interest. However, this case is based on considerations of molecular recognition and molecular diversity. I don't believe that either of these studies (which both imply that substituting cyclohexadiene for benzene would be a good thing) strengthens that case. Possibly if the analysis had been done differently I might have arrived at a different conclusion but a lack of error bars and a failure to acccount for the effect of molecular weight leave me with too many doubts.

Literature cited

Blomberg et al, Design of compound libraries for fragment screening. JCAMD 2009, 23 513-525 DOI

Colclough et al, High throughput solubility determination with application to selection of compounds for fragment screening. Bioorg. Med. Chem. 2008, 16, 6611-6616 DOI

Lovering, Bikker & Humblet, Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. J. Med. Chem. 2009, 52, 6752-6756 DOI

Ritchie & MacDonald, The impact of aromatic ring count on compound developability: Are too many aromatic rings a liability in drug design? Drug Discov. Today 2009, 14, 1011-1020 DOI

Monday, 8 October 2012

Viajem à Belém (minha tarefa)

Eu cheguei no Brasil ao fim de abril e tento aprender Português. Minha professor se chama Thalita e eu tenho aula de Português com Mihong, que é coreana. Eu faço um ‘blog post’ de minha viajem à Belém em Português para tarefa. Nao posso usar computador para traduzir mas tenho ajuda para conjugar os verbos irregulars. Esta semana, nós falamos sobre a comida e espero descobrir a receta coreana para cachorro-quente.  Também, eu assisto televisão para ajudar meu compreensivo de  Português e eu gosto do Canal Rural (muitas vacas).

Eu fiz uma palestra ao O IV Simpósio de Simulação Computacional e Avaliação Biológica de Biomoléculas na Amazônia (SSCABBA). O simpósio teve lugar na Universidade Federal do Pará (UFPA) e tenho duas fotos do campus. 




Jerônimo (à esquerda) foi o coordenador do evento e eu gostei muito da palestra de Ernesto (ao lado de Jerônimo).  Ademir (camisa de cor de laranja) fez um curso de Car-Parinello e eu compareci uma das seus palestras.



Minha palestra foi em inglês mas eu fiz uma piada em Português dos Argentinos (los hermanos). Uma molécula de água tem menos energia se ela está longe da superficie hidrofóbico.  Na termodinâmica energia pequena e felidade são equivalentes. Talvez posso fazer esta palestra em Buenos Aires?