Molecular Design: September 2018

Sunday, 30 September 2018

Hydrogen bonding asymmetries

Have you ever wondered why the Rule of 5 (Ro5) specifies hydrogen bond (HB) thresholds of 10 acceptors but only 5 donors? This is, perhaps, the prototypical example of what I'll call a 'hydrogen bonding asymmetry' and it is sometimes invoked in support of the folklore that HB donors are somehow 'worse' than HB acceptors in drug design. I have, on occasion, tried to track down the source of this folklore but that trail has always gone cold on me. In any case, I don't think the HB asymmetry in Ro5 has any physical significance since HB acceptors (especially as defined for Ro5) tend to be more common in chemical structures of interest to medicinal chemists than HB donors. This was discussed in our correlation inflation article and the bigger Ro5 question for me is why the high polarity limit is defined by counts of HB donors and acceptors while the low polarity limit is defined in terms of lipophilicity. As may become a blogging habit, I'll include some random photos (these are from a visit to India late in 2013) to break up the text a bit.

Drum fest at Buland Darwaza

It was this article in JCAMD about the 'polarized' nature of protein-ligand interfaces that got me thinking again about hydrogen bonding asymmetries. The study found that proteins donate twice as many HBs as they accepted. While the observation is certainly interesting, I do think that the authors might be over-interpreting it. For example, the authors suggest that it appears to be an underlying explanation for Ro5 and they may find that there are significant differences in their definitions of HB acceptors and those used to apply Ro5. The authors also state "Peptidyl ligands, on the other hand, showed no strong preference for donating versus accepting H-bonds". This observation would more be consistent with 'polarization' of protein-ligand interfaces being determined by nature of the ligand.

The authors assert that "lone pairs available to accept H-bonds are actually 1.6 times as prevalent as protons available to donate, both on the protein and ligand side of the interface." While it is appropriate to count lone pairs in situations where only one lone pair accepts an HB (e.g. when considering 1:1 hydrogen bonded complexes in low polarity solvents), I would argue that it is not appropriate to do so when considering biomolecular recognition in aqueous media because the acceptance of an HB by one oxygen lone pair makes the other lone pair less able to accept an HB. You can see this effect using molecular electrostatic potential as discussed in this article (see polarization effects section and Table 4). Put another way, how often is a carbonyl oxygen observed to accept two HBs from a binding partner? How many docking tools would explictly penalize a pose in which a carbonyl oxygen accepted two HBs?

As I see it, a typical protein is more likely to have a surplus of HB donors under normal physiological conditions. Some parts (e.g. serine, threonine, tyrosine and histidine side chains and the backbone) of a protein can be regarded as having equal numbers of HB donor and acceptor atoms. While the anionic side chains of aspartate and glutamate cannot donate HBs, the cationic side chains of arginine and lysine have five and three donor hydrogen atoms respectively while lacking HB acceptors. The tryptophan side chain has only a single HB donor (although its p-system is likely to be able to accept HBs) while each side chain of aspargine and glutamine has two donor hydrogen atoms and one acceptor oxygen atom. The histidine side chain is sometimes observed to be protonated in X-ray crystal structures which means that it should be considered to be more HB donor than HB acceptor in the constext of protein-ligand recognition. The tyrosine hydroxyl would be expected to be a stronger HB donor (and weaker HB acceptor) than the hydroxyls of either serine or threonine.

A magical place

The study considers the "possibility is that nature avoids the presence of chemical groups bearing both H-bond donor and acceptor capacity, such as hydroxyl groups, in the binding sites of proteins or ligands" although it is not clear what glycobiologists would have to say about this. Let's think a bit about what happens when a hydroxyl group donates its hydrogen atom. Let's suppose you've spotted a nice juicy hydrogen bond acceptor at the bottom of a deep binding pocket that is otherwise hydrophobic. The ligandability is eye-wateringly awesome (the ligandometer is beeping loudly and appears to have gone into dynamic range overload). Even the tiresome Mothers Against Molecular Obesity (MAMO) are impressed and have recommended that you deploy a hydroxyl group since this will be great for property forecast index (PFI). What could possibly go wrong?

The main problem is that the hydroxyl HB donor comes with baggage. In order to donate an HB to the acceptor at the bottom of that pocket, you're going to need to force an HB acceptor into contact with the non-polar part of that binding pocket. Although this contact is not inherently repulsive, it is destabilizing. Another factor is that donation of an HB by the hydroxyl group is likely to increase the HB basicity of the oxygen (which will exacerbate the problem). You can think of other neutral HB donors (e.g. amide NH) but the vast majority of them come with baggage the form of an accompanying HB acceptor. Exceptions such as NH in pyrrole (not renowned for stability) and indole (steric demands) come with baggage of their own. In contrast, the drug designer has access to a diverse set (e.g. heteroaromatic N, nitrile N, tertiary amide O, sulfoxide O, ether O) of HB acceptors that are not accompanied by HB donors. If you use one of these, you don't have the problem of having to also accommodate a ligand HB donor.

This is a good place to wrap up. In the next post, I'll talk about a completely different type of hydrogen bonding asymmetry, but for now, I'll leave you with some photos from an afternoon spent admiring asses in the Rann of Kutch.

Até mais!

Thursday, 13 September 2018

On the Nature of QSAR

With EuroQSAR2018 fast approaching, I'll share some thoughts from Brazil since I won't be there in person. I've not got any QSAR related graphics handy so I'll include a few random photos to break the text up a bit.

East of Marianne River on north coast of Trinidad

Although Corwin Hansch is generally regarded as the "Father of QSAR", it is helpful to look further back to the work of Louis Hammett in order to see the prehistory of the field. Hammett introduced the concept of the linear free energy relationship (LFER) which forms the basis of the formulation of QSAR by Hansch and Toshio Fujita. However, the LFER framework encodes two other concepts that are also relevant to drug design. First, the definition of a substituent constant relates a change in a property to a change in molecular structure and this underpins matched molecular pair analysis (MMPA). Second, establishing an LFER allows the sensitivity of physicochemical behavior to structural change to be quantified and this can be seen as a basis for the activity cliff concept.

Kasbah cats in Ouarzazate

As David Winkler and the late Prof. Fujita noted in this 2016 article, QSAR has evolved into "two QSARs":

Two main branches of QSAR have evolved. The first of these remains true to the origins of QSAR, where the model is often relatively simple and linear and interpretable in terms of molecular interactions or biological mechanisms, and may be considered “pure” or classical QSAR. The second type focuses much more on modeling structure–activity relationships in large data sets with high chemical diversity using a variety of regression or classification methods, and its primary purpose is to make reliable predictions of properties of new molecules—often the interpretation of the model is obscure or impossible.

I'll label the two branches of QSAR as "classical" (C) and "machine learning" (ML). As QSAR evolved from its origins into ML-QSAR, the descriptors became less physical and more numerous. While I would not attempt to interpret ML-QSAR models, I'd still be wary of interpreting a C-QSAR model if there was a high degree of correlation between the descriptors. One significant difficulty for those who advocate ML-QSAR is that machine learning is frequently associated with (or even equated to) artificial intelligence (AI) which, in turn, oozes hype. Here are a couple of recent In The Pipeline posts (don't forget to look at the comments) on machine learning and AI.

One difference between C-QSAR models and ML-QSAR models is that the former are typically local (training set compounds are closely related structurally) while the the latter are typically non-local (although not as global as their creators might have you believe). My view is that most 'global' QSAR models are actually ensembles of local models although many QSAR modelers would have me dispatched to the auto-da-fé for this heresy. A C-QSAR model is usually defined for a particular structural series (or scaffold) and the parameters are often specific (e.g. p value for C3-substituent) to the structural series. Provided that relevant data are available for training, one might anticipate that, within its applicability domain, local model will outperform a global model since the local model is better able to capture the structural context of the scaffold.

I would guess that most chemists would predict the effect on logP of chloro-substituting a compound more confidently than they would predict logP for the compound itself. Put another way, it is typically easier to predict the effect of a relatively small structural change (a perturbation) on chemical behavior than it is to predict chemical behavior directly from molecular structure. This is the basis for using free energy calculations to predict relative affinity and it also provides a motivation for MMPA (which can be seen as the data-analytic equivalent of free energy perturbation). This suggests viewing activity and properties in terms of structural relationships between compounds. I would argue that C-QSAR models are better able than ML-QSAR models to exploit structural relationships between compounds.

Down the islands with Venezuela in the distance

ML-QSAR models typically use many parameters to fit the data and this means that more data is needed to build them. One of the issues that I have with machine learning approaches to modeling is that it is not usually clear how many parameters have been used to build the models (and it's not always clear that the creators of the models know). You can think of number of parameters as the currency in which you pay for the quality of fit to the training data and you need to account for number of parameters when comparing performance of different models. This is an issue that I think ML-QSAR advocates need to address.

Overfitting of training data is an issue even for C-QSAR models that use small numbers of parameters. Generally, it is assumed that if a model satisfies validation criteria it has not been over-fitted. However, cross-validation can lead to an optimistic assessment of model quality if the distribution of compounds in the training space is very uneven. An analogous problem can arise even when using external test sets. Hawkins advocated creating test sets by removing all representatives of particular chemotypes from training sets and I was sufficiently uncouth to mention this to one of the plenaries at EuroQSAR 2016. Training set design and model validation do not appear to be solved problems in the context of ML-QSAR.

The Corniche in Beirut

I get the impression that machine learning algorithms may be better suited for classification than QSAR and it is common to see potency (or affinity) values classified as 'active' or 'inactive' for modeling. This creates a number of difficulties and I'll also point you towards the correlation inflation article that explains why gratuitous categorization of continuous data is very, very naughty. First, transformation of continuous data to categorical data throws away huge amounts of information which would seem to be the data science equivalent of shooting yourself in the foot. Second, categorization distorts your perception of the data (e.g. a pIC50 value of 6.5 might be regarded as more similar to one of 9.0 than one of 5.5). Third, a constant uncertainty in potency translates to a variable uncertainty in the classification. Fourth, if you categorize continuous data then you need to demonstrate that conclusions of analysis do not depend on the categorization scheme.

In the machine learning area not all QSAR is actually QSAR. This article reports that "the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods". However, the QSAR methods used appear to be based on categorical rather than quantitative definitions of activity. Even when more than two activity categories (e.g. high, medium, low) are defined, analysis might not be accounting for the ordering of the categories and this issue was also discussed in the correlation inflation article. Some clarification from the machine learning community may be in order as to which of their offerings can be used for modelling quantitative activity data.

I'll conclude the post by taking a look at where QSAR fits into the framework of drug design. Applying QSAR methods requires data and one difficulty for the modeler is that the project may have delivered its endpoint (or been put out of its misery) by the time that there is sufficient data for developing useful models. Simple models can be useful even if they are not particularly predictive. For example, modelling the response of pIC50 to logP makes it easy to see the extent to which the activity of each compound beats (or is beaten by) the trend in the data. Provided that there is sufficient range in the data, a weak correlation between pIC50 and logP is actually very desirable and I'll leave it to the reader to ponder why this might be the case. My view is that ML-QSAR models are unlikely to have significant impact for predicting potency against therapeutic targets in drug discovery projects.

So that's just about all I've got to say. Have an enjoyable conference and make sure keep the speakers honest with your questions. It'd be rude not to.

Early evening in Barra