Tuesday, 20 March 2012

Scoring non-additivity

Getting the most from target-ligand interactions is the essence of structure-based design and many of us believe that fragment-based methods represent an excellent way to do this. Recently, I came across an article that looked at cooperativity and hot spots in the context of scoring functions. Given that the Roche group appear to be trying to establish themselves as authorities on molecular recognition, (see review of this article) I decided to take a closer look.

We often think of molecular recognition in terms of contacts between molecules with each contact contributing (sometimes unfavourably) to the change in free energy associated with formation of complex. Although this is an appealing picture (and one we use in scoring functions as well as some QSAR approaches to affinity prediction) it is worth remembering that the contribution of an individual contact to affinity is not strictly an observable or something that you can actually measure.

Reading the article, I encountered the following in the introduction:

‘Classical experiments by Williams et al. on glycopeptides antibiotics18 and on the streptavidin-biotin complex19 have shown that binding causes these systems to be more rigid and thus enthalpically more favorable. The loss of binding entropy caused by the reduced motion is more than compensated by the gain in enthalpy achieved through tighter interactions.’

I would challenge the assertion that making a system more rigid necessarily results in a decrease in its enthalpy. If you took a complex and rigidified it in the wrong way you would actually increase its enthalpy to the extent that it exceeded that of the unbound complexation partners so it’s probably worth taking a closer look at rigidity in the context of binding before moving on. When people try to ‘explain’ affinity in terms of structure it usually results in a lot of arm waving which typically increases when entropy and enthalpy enter the equation. One way to think about entropy is in terms of constraints. As you increase the extent to which a system is constrained you decrease its entropy. When a ligand binds to a protein it becomes constrained in a number of ways. Firstly, it has constrained to move with the protein and introducing this constraint results in a loss of translational entropy. Secondly, binding restricts the conformations available to the ligand and we can think of the ligand as being constrained to exist in a subset of the conformations in which it could exist before binding. We can think of the protein as constraining the conformational space of the ligand by modifying the potential energy surface of the ligand. Entropy changes associated with binding quantify changes in the degree to which the system (protein, ligand and solvent) is constrained. As a system becomes more constrained it moves less and it is more correct to say that the additional degree of constraint ‘causes’ the reduced motion than to say that the reduced motion ‘causes’ the reduction in entropy.

However, I don’t want this post turning into Thermodynamics 101 and should get back to the featured article. The scoring functions used to assess potential binding modes (poses) typically include a weighted sum of the contacts between between protein and potential ligand. Setting these weights appropriately is the biggest problem in development of scoring functions of this nature. The authors of the featured article attempt to go beyond an additive model by using subgraph network descriptors to capture cooperativity. Many of the ligands, protein structures, measured potencies and affinities used in this study appear to be proprietary. Personally, I found it difficult to figure out exactly what the authors had done and I would challenge anybody to repeat the work. An editorial (admittedly in Journal of Medicinal Chemistry rather than Journal of Chemical Information and Modeling) is very clear on this point and the authors might want to check it out before submitting further manuscripts on this theme:

‘Computational methods must be described in sufficient detail for the reader to reproduce the results’

Molecular interactions form the basis of this work and the authors ‘propose a comprehensive set of attractive and repulsive noncovalent interactions’ for which the atom types are defined using SMARTS notation. Since the authors don’t actually tell us what SMARTS patterns they use to define the interaction types, Tables 1 and 2 raise more questions than they answer. Does the hydrogen bond interaction include charged donors or acceptor and, if so, are ions in contact also considered to be hydrogen bonded? What is the difference between a dipolar interaction and a hydrogen bond? Why do dipoles interact with cations but not anions? If the cation-π interaction is so favourable, why is the anion-π interaction considered as insignificant? You can claim

‘a newly defined comprehensive set of noncovalent contacts that encode state-of-the-art knowledge about molecular interactions’

but if you’re not going to tell us how you do the encoding, readers are going to be left with the impression that the statement is as inaccurate as it is immodest.

Using a single pairwise interation coefficient to score all hydrogen bonds severely limits the accuracy of an interaction-based scoring function, especially if hydrogen bonds involving charged groups are to be included. Whilst cooperative effects are relevant to the scoring of hydrogen bonds, the inherent variability of individual hydrogen bonds is likely to be as least as important. One way to make hydrogen bonds pay their way in aqueous media is to balance the strengths of donor and acceptor. If an acceptor is ‘too weak’ for the donor, the former is unable to fully ‘compensate’ the latter for its ‘lost solvation’. I’ve discussed this in the introduction to a 2009 article on hydrogen bonding if you’re interested in following this up a bit more and you might also want to look at these measured values of hydrogen bond acidity and basicity measurements for diverse sets of donor and acceptors.

Although values of hydrogen bond acidity and basicity measured for prototypical (e.g. single donor or acceptor) model compounds are useful in molecule design, they don’t tell the whole story. The environment (e.g. at bottom of narrow hydrophobic pocket) of a hydrogen bonding group in a protein can greatly affect its recognition characteristics. It may yet prove possible to model hydrophobic enclosure accurately in terms of cooperative contacts between protein and ligand but I'll not be holding my breath. There does seem to be a view that the contribution of hydrogen bonds to affinity will be small and my former Charnwood colleagues assert that a neutral-neutral hydrogen bond will contribute no more than a factor of 15 (1.2 log units). However, at Alderley Park (where we never used to pay too much attention to directives from ‘Camp Chuckles’) we managed to find a couple of examples where replacing CH in a pyridine ring with nitrogen resulted in potency increases of 2 log units. I guess the jury is still out on this one.

The authors of the featured article used their interaction types and subgraph network descriptors to develop a new scoring function which they call ScorpionScore (SScorpion). They assembled four training sets and

‘maximized the Spearman rank correlation coefficient for affinity data sets I–III and minimized deviations in absolute affinity differences for training set IV in parallel, with each data set being weighted by 25%’.

It was not clear why they split the data that comprise the sets I-III in this manner but it will have the effect of slightly emphasising the contribution from set I, which is smaller (31) than the other two sets (II: 46; III: 42). It would have been helpful to explicitly state the function that they’d optimized and I’d also like to have known exactly how they removed

highly correlated (Spearman rank correlation ρ > 0.8) terms from the set’.

The performance of the Scorpion scoring function is summarised in Table 7 of the article. There only seems to be an advantage in using the network terms for training sets I (Neuraminidase) and IV (Activity Cliffs). Significantly, the performance of SScorpion is no better than a simple count of heavy (non-hydrogen) atoms for data set III (Diverse) which one might argue is the most relevant to virtual screening against and arbitrary target. The results presented in Table 7 quantify how well the different scoring schemes fit the training data and one would expect that using the network terms (SScorpion) will be better than not using the network terms (Spairwise) simply because more parameters are used to do the fitting. The authors do compare SScorpion with other scoring functions (Tables S4 – S6 in the supplemental material) although they did not make the crucial comparisons with Spairwise in this context. In my opinion, the authors have not demonstrated that the network terms significantly increase the performance of the scoring and I would question the value of the interaction network diagrams (Figures 8 -16).

I’ll finish off by making some comments on the (mostly proprietary) assay results used to develop SScorpion. The authors state:

‘It is important to note that we optimize against a combination of training sets, in which for each set ligand affinities were determined with the same assay and for the same protein.’

I would certainly agree that mixing results from different assays is not a good idea when attempting to relate structure to activity. However, it is also important to note that the term ‘affinity’ strictly refers to Kd and this is not the same quantity as IC50 (which generally depends on substrate and co-factor concentrations). For example, you can account for intracellular ATP concentrations by running kinase inhibition assays at high ATP concentration. However, there is another issue that one needs to consider when dealing with IC50 values and I’ll illustrate it with reference to the compounds shown in the figure at the top of this paragraph which I’ve pulled from row 13 of Table 5. The pIC50 values shown in this figure are certainly consistent with cooperative binding of the two chlorine atoms. For example, the effect of adding a 2-chloro substituent to 1 (ΔpIC50 = 5.6 – 4.4 = 1.2) is a lot smaller than that observed (ΔpIC50 = 8.0 – 5.8 = 2.2) when 2 is modified in an analogous manner. When trying to quantify cooperativity it is very important that all IC50 values are measured as accurately as possible and the assay often needs to have a high dynamic range. Our perception of cooperativity for these compounds depends very much on the pIC50 for compound 1 being measured accurately and assay interference increasingly becomes an issue for weaker inhibitors. My main reason for bringing this up is that it may be possible to quantify the degree of interference in order to make the appropriate corrections and this blog post may also be of interest.

As stated earlier, I do not believe that the authors have shown that network descriptors add anything of real value. For the largest training set(II: PDE10), the both SScorpion and Spairwise are outperformed by a single parameter (number of heavy atoms). Some readers may think this assessment overly harsh and I'd ask those readers to consider an alternative scenario in which QSAR models with different numbers of parameters were ranked according to how well they fit the training data. Thinking about a scoring function as a QSAR model may be instructive in other ways. For example, QSAR models that are optimistically termed 'global' may just be ensembles of local models and to equate correlation with causation is usually to take the first step down a long, slippery slope.

Literature cited

Kuhn, Fuchs, Reutlinger, Stahl & Taylor, Rationalizing Tight Ligand Binding through Cooperative Interaction Networks. J. Chem. Inf. Model. 2011, 51, 3180–3198 DOI

Bissantz, Kuhn & Stahl, A Medicinal Chemist’s Guide to Molecular Interactions. J. Med. Chem. 2010, 53, 5061–5084 DOI

Stahl & Bajorath, Computational Medicinal Chemistry. J. Med. Chem., 2011, 54, 1–2 DOI

Kenny, Hydrogen Bonding, Electrostatic Potential, and Molecular Design. J. Chem. Inf. Model., 2009, 49, 1234–1244. DOI

Abraham, Duce , Prior , Barratt , Morris & Taylor, Hydrogen bonding. Part 9. Solute proton donor and proton acceptor scales for use in drug design. J. Chem. Soc., Perkin Trans. 2, 1989, 1355-1379 DOI

Abel, Wang, Friesner & Berne, A Displaced-Solvent Functional Analysis of Model Hydrophobic Enclosures. J. Chem. Theory Comput. 2010, 6, 2924–2934 DOI

Davis & Teague, Hydrogen Bonding, Hydrophobic Interactions, and Failure of the Rigid Receptor Hypothesis. Angew. Chem. Int. Ed. 1999, 38, 736-749 DOI

Bethel et al, Design of selective Cathepsin inhibitors. Bioorg. Med. Chem. Lett. 2009, 19, 4622–4625. DOI

Shapiro, Walkup & Keating, Correction for Interference by Test Samples in High-Throughput Assays. J. Biomol. Screen. 2009, 14, 1008-1016 DOI


Vladimir Chupakhin said...

Thank's for interesting comment! I am actually dream of treating every modeling paper as mathematical theorem proving. With all of the data and turning points provided.

Pete said...

It's not just the data points that are unavailable. Given that the substructural patterns that define the atom types are not given, one can argue that the scoring functions are never actually defined.

jen said...

Wow. That's pretty heavy stuff!

Pete said...

I hope that you enjoyed (as much as post like these can be enjoyed) all the same. If you're interested in this sort of thing, you might want to check out the facebook group (see 'useful links' on left hand side of page). There's also an email address listed in my blogger profile if you want to get in touch by that route.

Pete said...
This comment has been removed by the author.