Molecular Design: 2017

Friday, 15 December 2017

Eu matei o Oscar Niemeyer

I never meant to kill Oscar Niemeyer.

Vasily Grigoryevich Zaytsev was a pretty good shot. During the pivotal battle of Stalingrad he dispatched 225 enemy soldiers, including 11 snipers. Zaytsev died in Kyiv on Oscar Niemeyer's 84th birthday and was reburied 14 years later on Mamayev Hill in Volgograd with full military honors.

Oscar Ribeiro de Almeida Niemeyer Soares Filho was born in Rio de Janeiro on 15th December, 1907 and he is best known for design of some of the buildings in Brasilia. Here are examples of his work:

I first learned of Curitiba in the 1980s from a Scientific American article on urban planning and it had been on my list of places to visit for almost 30 years when I arrived in Brazil in 2012. I also learned about Curitiba in Portuguese class and, while planning the trip, I was particularly excited to discover that Museu Oscar Niemeyer is in Curitiba.

I flew to Curitiba on Thursday November 30, 2012 and the museum visit was scheduled for the Saturday. I found a nice vegetarian restaurant near the museum and lunched there. Although not a vegetarian, I certainly enjoy vegetarian food and, in any case, there is only so much picanha that one can eat. As an aside, I can claim a degree of skill in finding vegetarian restaurants in unlikely places such as El Calafate (Peron and Kirchner must be turning in their respective graves) and Montevideo (didn't think that vegetables were allowed in Uruguay). The museum is sometimes called Museo do Olho (museum of the eye) on account of its most distinctive feature. Here are some photos:

I thoroughly enjoyed the afternoon in the museum and, as well as viewing the exhibits, I watched a video on the life of Niemeyer and also read some biographical material. However, neither source mentioned when he had died. On returning to São Carlos the next day, I googled him and was amazed to discover that he was still alive (and due to turn 105 in a couple of weeks). With all this talk about Zaytsev, you're probably thinking that I borrowed a sniper rifle and picked off Brazil's National Treasure as he shuffled around the garden on his zimmer frame. However, reality is even more bizarre.

At that time, I was taking Portuguese classes and the weekend in Curitiba was certainly going to give me something to talk about (typically the conversation would be about how deadly Friday's group meeting had been and why ponto de fusão is so important). On being asked about my weekend, I responded, "Eu viajei para Curitiba e visitei o museu Oscar Niemeyer. Ele vive ainda!" (I travelled to Curitiba and visited the Oscar Niemeyer museum. He is still alive!)

Oscar Niemeyer died a few hours later.

Thursday, 9 November 2017

Hydrogen bonding and electronegativity

So once again it's #RealTimeChem Week and to 'celebrate' we'll be taking a look at the relationship between hydrogen bond basicity and electronegativity in this blog post. The typical hydrogen bond is an interaction between an electronegative atom and a hydrogen atom that is covalently bonded to an another electronegative atom. We tend to think about hydrogen bonding as electrostatic in nature and we often use electrostatic models to describe the phenomenon. Let's take a look at hydrogen fluoride dimer which is probably the simplest hydrogen bonded system.

Fluorine is more electronegative than hydrogen which means that it tends to draw the electrons it shares with hydrogen towards itself. This gives fluorine a partial negative charge and hydrogen a partial positive charge. This simple electrostatic model suggests that a hydrogen bond will get stronger in response to increases in the electronegativity of either the acceptor atom or the atom to which the donor hydrogen is covalently bonded.

Hydrogen bond strength can be quantified as the equilibrium constant for the association of a hydrogen bond donor with a hydrogen bond acceptor in a non-polar solvent. For example, pK_BHX can be used as a measure of hydrogen bond basicity where K_BHX is the equilibrium constant for association of the hydrogen bond acceptor compound (e.g. pyridine) with 4-fluorophenol in carbon tetrachloride. Let's take a look at some pK_BHX values for three structurally prototypical compounds that present nitrogen, oxygen or fluorine to a hydrogen bond donor.

The trend is the complete opposite of what you might have expected on the basis of the simple electrostatic model for hydrogen fluoride dimer. However, this is not as weird as you might think because electronegativity tells us about distribution of charge between atoms but at hydrogen bonding distances the donor can 'sense' the distribution of charge within the acceptor atom. Electronegativity quantifies the extent to which an atom can function as an 'electron sink' and this is also related to how effectively the atom can 'hide' the resulting excess charge from the environment around it. Put another way, fluorine will appear to be really weird if you think of it as a large, negative partial atomic charge

This is a good place to wrap up and, if you're interested in this sort of thing, why not take a look at this article on prediction of hydrogen bond basicity from molecular electrostatic potential. My most up to date hydrogen bond basicity data set can be found in the supplemental information (check the zip file) for this article and that's where I got the figures for the table.

Monday, 23 October 2017

War and PAINS

Non-interfering chemotypes are all alike;
every interfering chemotype interferes in its own way

From afar, the Grande Armée appears invulnerable. The PAINS filters have been written into the Laws of MedChem and celebrated in Nature. It may be a couple of thousand kilometers to Moscow but what could possibly go wrong?

Viscount Wellington (he is yet become the 'Iron Duke') shadows the Grande Armée from the south. Dismissed as 'The Sepoy General' (he just writes blogs), Wellington knows that the best way to win battles is to persuade opponents to first underestimate him and then to attack him. He also knows that seemingly intractable problems often have simple solutions and, when asked years later by Queen Victoria as to how the sparrow problem of the new Crystal Palace might be resolved, his pithy response is, "Sparrowhawks, Ma'am".

Marshal Ney guards the southern flank of the Grande Armée and Wellington knows, from the Peninsular War, that Ney is a formidable opponent. Wellington is fully aware that, on the steppes, it will be all but impossible to exploit Ney's principal weakness (an unquestioning and unwavering belief in meaningfulness of Ligand Efficiency). Wellington knows that he will have to be very, very careful because Ney is a masterful exponent of the straw man defense.

The first contact with the Grande Armée occurs unexpectedly in the Belovezhskaya Pushcha. One of Ney's subordinates has set off in hot pursuit of a foraging party of thiohydantoins (which he has mistaken for rhodanines) and left the flank of Grande Armée exposed. Wellington orders an attack in an attempt to capitalize on the blundering subordinate's intemperance and it is only through the prompt action of Ney, who takes personal charge of the situation, that disaster is averted.

The first skirmish proves to be tactical draw although Ney's impetuous subordinate has been relieved of his command and put on clam-gathering duty. Wellington orders a diversionary attack designed to probe the Grande Armée defenses and then another intended to lure Ney into counter-attacking his carefully prepared defensive position. Ney initially appears to take the bait but soon disengages once he perceives the depth of Wellington's defense. It will prove to be their final clash for the duration of this campaign.

The next contact with the Grande Armée takes place at Smolensk. A regiment of Swiss Guards, on loan from the Vatican, becomes detached from the main force and blunders into Wellington's outer defensive belt. The Swiss Guards' halberds prove to be of little use in this type of warfare and they are swiftly overwhelmed. As they are taken captive, many of the Swiss Guards are heard to mutter that the six assays of the PAINS panel are "different and independent, different and independent..." although none seems wholly convinced by their mantra.

Following the tactical victory at Smolensk, Wellington receives word by messenger pigeon that Marshal Kutuzov will attempt to stop the Grande Armée at Borodino (on the road to Moscow) and that Wellington should do what he can to harass the Grande Armée so as to buy more time for Kutuzov to complete his preparations. Wellington orders another diversionary attack which exposes the narrowness of the PAINS filter applicability domain before proceeding to Borodino where he arrays his troops to the south of the road to Moscow.

The armies of Wellington and Kutuzov are now disposed so as to counter a flanking maneuver by the Grande Armée but it is the army of Kutuzov that will bear the brunt of the attack while Wellington's force is held in reserve. Wellington marvels at Kutuzov's preparations and the efficient manner in which he has achieved optimal coverage of the terrain with the design of the training set. Not a descriptor is wasted and each differs in its own way since they are uncorrelated with each other. Nobody will be able to accuse Kutuzov of overfitting the data. Over a century later, Rokossovsky will tell Zhukov, "everything I know about QSAR, I learned from Kutuzov".

The Grande Armée advances confidently but Kutuzov's troops are ready and up to the task at hand. The Grande Armée is first stopped in its tracks by a withering hail of grapeshot (more than half of the PAINS alerts were derived from one or two compounds only) and then driven back (...were originally derived from a proprietary library tested in just six assays measuring protein–protein interaction (PPI) inhibition using the AlphaScreen detection technology only). Running out of options, the Grande Armée commanders are forced to commit the elite Garde Impériale which temporarily blunts Kutuzov's advance. Wellington maneuvers his troops from their defensive squares into an attacking formation and awaits Kutuzov's order to commit the reserve.

Although the Grande Armée commanders consider it beneath them to do battle with the lowly Sepoy General, they have at least strengthened their southern flank in acknowledgement of his presence there. This in turn has weakened the northern flank which has been assumed to be safe from interference and it is at this point in the battle that the Grande Armée gets an unpleasant surprise. There is the unmistakable sound of hoofbeats coming from the north, quiet at first but getting louder by the minute. Emerging from the smoke of battle, Prince Blücher's Uhlans slam into the lightly-protected northern flank of the Grande Armée (the same PAINS substructure was often found in consistently inactive and frequently active compounds). Following an attack plan on which Blücher has provided vital feedback, Wellington commits his troops although, in reality, there is little left for them to do aside from pursuing the retreating Garde Impériale.

There are lessons to be learned from the fate of the Grande Armée. The PAINS filters were caught outside their narrow applicability domain on the vast Russian steppes and their fundamental weaknesses were brought into sharp focus by Blücher and Kutuzov who made effective use of large, non-proprietary data sets. Whether you're talking about T-34s or data, quantity has a quality all of its own and science blogs are here to stay although, from time to time, every blogger should write a journal article "pour encourager les autres".

Thursday, 12 October 2017

The resurrection of Maxwell's Demon

Sometimes when reading the residence time literature, I get the impression that the off-raters have re-animated Maxwell's Demon. It seems as if a nano-doorman stands guard at at the entrance of the binding site, only opening his nano-door to ligand molecules that want to get in. Microscopic Reversibility? Stop being so negative! With Big Data, Artificial Intelligence, Machine Learning (formerly known as QSAR) and Ligand Efficiency Metrics we can beat Microscopic Reversibility and consign The Second Law to the Dustbin Of History!

There were a number of things that triggered this blog post. First, I saw a recent article that got me thinking about philatelic drug discovery. Second, some of the off-raters will be getting together in Berlin next week and I wanted to share some musings because I won't be there in person. Third, my former colleague Rutger Folmer has published a useful (dare I say, brave) critique of the residence time concept that is bang on target.

I'm not actually going to say much about Rutger's article except to suggest that you read it. That's because I really want to examine the article on philatelic drug discovery in a more detail (it's actually about thermodynamic and kinetic profiling but I thought the reference to philately would better grab your attention). My standard opening move when playing chess with an off-rater is to assert that slow binding is equivalent to slow distribution. In what situations would you design a drug to distribute slowly?

Chemical kinetics is all about energy barriers and, the higher the barrier, the slower things will happen. Microscopic reversibility tells us that a barrier to association is a barrier to dissociation and that the ligand will return to solution along the same path that it took to its binding site. Microscopic reversibility tells you that if you got into the parking spot you can get out of it as well although that may not be the experience of every driver. The reason that microscopic reversibility doesn't always seem to apply to parking is that most humans, with the possible exception of tank drivers in the Italian army, are more comfortable in forward gear than in reverse. Molecules, in contrast, have no more concept of forward and reverse than they do of standard states, IUPAC or the opinions the 'experts' who might quantitatively estimate their drug-likeness while judging their beauty. Molecules don't actually do concepts. Put more uncouthly, molecules just don't give a toss.

I've created a graphic to illustrate to show how things might look in vivo when there is a barrier to association (and, therefore, to dissociation). We can think of the ligand molecule having to get over the barrier in order to get to its binding site and we call the top of the barrier the 'transition state'. This is a simplified version of reality (it is actually the system that passes from the unbound state through the transition state to the bound state and for some ligand-protein association there is no barrier) but it'll serve for what I'd like to say. The graphic consists of three panels and the first (A) of these illustrates the situation soon after dosing when the concentration of ligand (L) is relatively high and the target protein (P) has not had sufficient time to respond. If the barrier is sufficiently high, the system can't get to equilibrium before the ligand concentration starts to fall in what a pharmacokineticist might refer to as the elimination phase. Under this scenario the system will be at equilibrium briefly as the ligand concentration falls and I've shown this in panel B. After the equilibrium point is reached, the rate of dissociation exceeds the rate of association and this is shown in panel C.

There's something else that I'd like you to take a look at in the graphic and that's the free energy (G) of the unbound state (P + L). See how it goes down relative to the free energy of the bound state (P.L) as the concentration of ligand decreases. When thinking about energetics of these systems, it actually makes a lot of sense to use the unbound state as the reference but you do need to use a reference concentration (e.g. 1 M) to to do this.

When we do molecular design we often think in terms of manipulating energy differences. For example, we try to increase affinity by stabilizing the bound state relative to the unbound state. Once you start trying to manipulate off-rates, you soon realize that you can't change one thing at a time (unless you draft Maxwell's Demon into your project team). I've created a second graphic which looks similar to the first graphic although there are important differences between the two graphics. In particular, I'm referencing energy to the unbound state (P + L) which means that the ligand concentration is constant in all three panels. Let's consider the central panel as the starting point for design. We can go left from that starting point and stabilize the bound state which is equivalent to optimizing affinity. Stabilizing the bound state will also result in slower dissociation provided that the transition stare energy remains unchanged. This is a good thing but it's difficult to show that the benefits come from the slower dissociation and not from the increased affinity. If you raise the barrier (i.e. increase the energy of the transition state) to reduce the off-rate you'll find that you have slowed the on-rate to an equal extent.

Before moving on, it may be useful to sum up where we've got to so far. First, ask yourself why you think off-rates will be relevant in situations where concentration changes on a longer time scale than binding. Second, you'll need to enlist the help of Maxwell's Demon if you want to reduce off-rate without affecting on-rate and/or affinity. Third, if you want to consider binding kinetics in design then it'd be best to use barrier height (referenced to unbound state) and affinity as your design parameters.

Now I'd take a look at the philatelic drug discovery article. This is a harsh term but it does capture a tendency in some drug discovery programs to measure things for the sake of it (or at least to keep the grinning Lean Six Sigma 'belts' grinning). Some of this is a result of using techniques such as isothermal titration calorimetry (ITC) and surface plasmon resonance (SPR) that yield information in addition to affinity (that is of primary interest) at no extra cost. I really don't want to come across as a Luddite and I must stress that measurements of enthalpy, entropy, on-rate and off-rate are of considerable scientific interest and are also valuable for improving physical models. Furthermore, I am continually awed by the exquisite sensitivity of modern ITC and SPR instruments and would always want the option to be able to measure affinity using at least one of these techniques. However, problems start when the access to enthalpy, entropy, off-rates and on-rates becomes exploited for 'metrication' and drug discovery scientists seek 'enthalpy-driven' binding simply because the binding will be more 'enthalpy-driven'. It is easier to make the case for relevance of binding kinetics although, as Rutger points out, reducing the off-rate may very well make things worse if the on-rate is also reduced. It is much more difficult to assemble a coherent case for the relevance of thermodynamic signatures in drug discovery. Perhaps, some day, a seminal paper from the Budapest Enthalpomics Group (BEG) will reveal that isothermal systems like live humans can indeed sense the enthalpy and entropy changes associated with drugs binding to their targets although I will not be holding my breath.

Unsurprisingly, the thermodynamic and kinetic profiling (aka philatelic drug discovery) article advocates thermodyanamic profiling of bioactive compounds in lead optimization projects. I'm going to focus on the kinetic profiling and it is worrying that the authors don't seem to be aware that on-rates and off-rates have to be seen in a pharmacokinetic context in order to make the connection with drug discovery. The authors may find it instructive to think about how inhibitor concentration would have varied over the course of a typical experiment in their cell-based assays. They are also likely to find Rutger's article to be educational and I recommend that they familiarize themselves with its content.

The following statement suggests that it may be beneficial for the authors to also familiarize themselves with the rudiments of chemical kinetics:

"Association and dissociation rate constants (k_on and k_off) of compound binding to a biological target are not intrinsically related to one another, although they are connected by dissociation equilibrium constant K_D (K_D = k_off/k_on)."

The processes of association and dissociation are actually connected by virtue of taking place along the same path and by having to pass through the same transition states. The difference in barrier heights for association and dissociation is given by the binding free energy.

Some analysis of relationships between potency in a cell-based assay and K_D, k_offand k_onwere presented in Figure 6 of the article. I have a number of gripes with the analysis. First, it would be better to use logarithms of quantities like K_D, IC₅₀, k_offand k_on when performing analysis of this nature. In part, this because we typically look for linear free energy relationships in these situations. There is another strong rationale for using logarithms because analysis of correlations between continuous variables works best when the uncertainties in data values are as constant as possible. My second gripe is that the authors have chosen to bin their data for analysis and this is a great way to shoot yourself in the foot. When you bin continuous data you both reduce your data analysis options and leave people wondering whether the binning has been done to hide the weakness of the trends in the data. I have droned at length about why it is naughty to bin continuous data so I'll leave it at that.

It's been a long post and it's time to wrap things up. If you've found the post to be 'cansativo' (sounds so much more soothing in Portguese) then spare a thought for the person who had to write it. To conclude, I'll leave you with a quote that I've taken from the abstract for Rutger's article:

"Moreover, fast association is typically more desirable than slow, and advantages of long residence time, notably a potential disconnect between pharmacodynamics (PD) and pharmacokinetics (PK), would be partially or completely offset by slow on-rate."

Wednesday, 20 September 2017

To logP or logD, that is the question

So last week I asked twitter which lipophilicity measure was more relevant of binding of bases to hERG. The poll resulted in a landslide for logD(pH=7.4) (70%; 21 votes) over logP (30%; 9 votes). I did not vote.

So let's take another look at the question and I've cooked up a thought experiment to help you do this. Let's suppose that we have an amine bound to hERG (which your Scottish colleagues may call hairrg). It has a pK_a of 10.4 and logP of 6 and the IC₅₀ in the hERG assay is 100 nM (the safety people think that this will lead to an unpleasant torsades de pointes that will hERG a whole lot more than a corrective thrashing by Wendi Whiplasch). Provided that there is no significant partitioning of the protonated form of the amine into the octanol, the logD(7.4) value for the amine will be 3.

Let's imagine that we can change the pK_a of the amine while keeping all the other physicochemical and molecular properties the same. Changing the amine pK_a from 10.4 to 12.4 will get logD(7.4) down to 1. But how do you think the hERG IC₅₀ will respond?

Saturday, 1 April 2017

A concentration of scoring functions

<< previous || next >>

Researchers at The Hungarian Institute Of Thermodynamics have published a number of seminal articles on the interplay of enthalpy and entropy in areas ranging from physical chemistry to socioeconomics. For example, the cause of World War 1 (also known as 'The Great War' although I doubt whether any of its participants thought that it was that great) was traced to a singularity in the Habsburg Partition Function. In a nutshell, the problem was shown to be a surfeit of the wrong type of entropy (which led to Franz Ferdinand's driver getting lost) coupled with a deficit in the right type of entropy (which would have prevented Gavrilo Princip's bullets from finding their targets). However, it is unlikely that any amount of the right type of entropy could have saved the hapless Maximilian I of Mexico, who generously volunteered to be Emperor only to be shot by the ungrateful Mexicans.

The most recent study from BEG (Budapest Enthalpomics Group) is little short of sensational. Unfortunately it's not available online and the poor fax quality, coupled with my rudimentary grasp of Hungarian, have made the going hard. The essence of this seminal study is that the performance of scoring functions can be significantly improved by including the concentration unit (in which affinity is expressed) as a parameter in the fitting process. The casual observer of virtual screening may have wondered why scoring functions are trained with affinity but validated by enrichment. By treating the concentration unit as a parameter in the fitting process, the authors were able to achieve unprecedented accuracy of prediction and the phone call from Stockholm would seem to be a foregone conclusion. Commenting on these seminal findings, Prof. Kígyó Olaj, the director of the Institute said, "Now we no longer need to use ROC plots to mask feeble correlations between predicted and measured affinity".

Tuesday, 24 January 2017

PAINS and editorial policy

<< previous || next >>

I have blogged previously (1 | 2 | 3 | 4 | 5 ) on PAINS. In this post, I present the case against inclusion of PAINS criteria in the Journal of Medicinal Chemistry (JMC) Guidelines for Authors (viewed 22-Jan-2017) as given below:

"2.1.9. Interference Compounds. Active compounds from any source must be examined for known classes of assay interference compounds and this analysis must be provided in the General Experimental section. Compounds shown to display misleading assay readouts by a variety of mechanisms include, but are not limited to, aggregation, redox activity, fluorescence, protein reactivity, singlet-oxygen quenching, the presence of impurities, membrane disruption, and their decomposition in assay buffer to form reactive compounds. Many of these compounds have been classified as Pan Assay Interference Compounds (PAINS; see Baell & Holloway, J. Med. Chem. 2010, 53, 2719-2740 and webinar at bit.lyj/mcPAINS). Provide firm experimental evidence in at least two different assays that reported compounds with potential PAINS liability are specifically active and their apparent activity is not an artifact."

The term 'known classes of assay interference compounds' must be defined more precisely in order to be usable by both authors submitting manuscripts and reviewers of those manuscripts. Specifically, the term 'known classes of assay interference compounds' implies the existence of a body of experimental data in the public domain for which specific substructural features have been proven to cause the observed assay interference. The JMC Guidelines for Authors (viewed 22-Jan-2017) imply that any assay result for a compound with 'potential PAINS liability' should necessarily be treated as less informative than would be the case if the compound did not have 'potential PAINS liability'. I shall term this as 'devaluing' the assay result.

The PAINS acronym stands for Pan Assay INterference compoundS and it was introduced in a 2010 JMC article (BH2010) that is cited in the Guidelines for Authors (viewed 22-Jan-2017). The PAINS filters introduced in the BH2010 study are based on analysis of frequent-hitter behavior in a panel of 6 AlphaScreen assays. Each PAINS filter consists of a substructural pattern and is associated with an enrichment factor that quantifies the frequent hitter behavior. Compounds that quench or scavenge singlet oxygen have the potential to interfere with AlphaScreen assays but individual PAINS substructural patterns were not evaluated for their likelihood of being associated with singlet oxygen quenching or scavenging. For example, the BH2010 study makes no mention of studies ( 1 | 2 | 3 | 4 ) linking singlet oxygen quenching/scavenging to the presence of a thiocarbonyl group which is a substructural element present in rhodanines.

I argue that a high hit rate against a small panel of assays that all use a single detection technology is inadmissible as evidence for pan-assay interference. I also argue that the results of screening against this assay panel can only be invoked to devalue the result from an AlphaScreen assay. In a cheminformatic context, the applicability domain of a model based on analysis of results from this assay panel is restricted to activity measured in AlphaScreen assays. Furthermore, it is questionable whether it is valid to invoke the results of screening against this assay panel to devalue a concentration response from an AlphaScreen assay because the results for each assay of the panel were obtained at a single concentration (i.e. no concentration response).

The BH2010 study does present some supporting evidence that compounds matching PAINS substructural patterns are likely to interfere with assays. In a cheminformatic context, this supporting evidence can be considered to extend the applicability domain of PAINS filters. However, supporting evidence is only presented for some of the substructural patterns and much of that supporting evidence is indirect and circumstantial. For example, the observation that rhodanines as a structural class have been reported as active against a large number of targets is, at best, indirect evidence for frequent hitter behavior which is characterized by specific compounds showing activity in large numbers of assays. There is not always a direct correspondence between PAINS substructural patterns and those used in analyses that are presented as supporting evidence. For example, the BH2010 study uses substructural patterns for rhodanines that specify the nature of C5 (either saturated or with exocyclic carbon-carbon bond). However, the sole rhodanine definition given in the BMS2006 study specifies an exocyclic carbon-carbon double bond. This means that it is not valid to invoke the BMS2006 study to devalue the result of every assay performed on any rhodanine.

The data (results from 6 AlphaScreen assays and associated chemical structures) that form the basis of the analysis in the BH2010 study are not disclosed and must therefore be considered to be proprietary. Furthermore, some of the supporting evidence that compounds matching PAINS filters are likely to interfere with assays is itself based on analysis (e.g. BMS2006 and Abbott2007) of proprietary data. The JMC Guidelines for Authors (viewed 22-Jan-2017) make it clear that the use of proprietary data is unacceptable:

"2.3.5.2 Proprietary Data. Normally, the use of proprietary data for computational modeling or analysis is not acceptable because it is inconsistent with the ACS Ethical Guidelines. All experimental data and molecular structures used to generate and/or validate computational models must be reported in the paper, reported as supporting information, or readily available without infringements or restrictions. The Editors may choose to waive the data deposition requirement for proprietary data in a rare case where studies based on very large corporate data sets provide compelling insight unobtainable otherwise.

2.3.6 QSAR/QSPR and Proprietary Data. The following are general requirements for manuscripts reporting work done in this area:

(3) All data and molecular structures used to carry out a QSAR/QSPR study are to be reported in the paper and/or in its supporting information or should be readily available without infringements or restrictions. The use of proprietary data is generally not acceptable."

Given JMC's stated unacceptability of analysis based on proprietary data, to use such analysis to define editorial policy would appear to contradict that editorial policy.

To sum up:

Analysis of the screening results for the BH2010 assay panel can only be invoked invoked to devalue or otherwise invalidate the result from an AlphaScreen assay.

Additional supporting evidence is only provided in BH2010 for some of the PAINS filters. In these cases, the evidence is not generally presented in a manner that would allow a manuscript reviewer to assess risk of assay interference in an objective manner.

Most of the analysis presented in the BH2010 study has been performed on proprietary data. To base JMC editorial policy on analysis of proprietary data would appear to contradict the Journal's policy on the use of proprietary data.

I rest my case.

Friday, 6 January 2017

Confessions of a Units Nazi

Regular readers (both of them) of this blog will know that I have an interest, which some might term an obsession, with units. At high school in Trinidad, we had the importance of units beaten into us by the Holy Ghost Fathers and, for some of the more refractory cases, the beating was quite literal. I was taught physics by the much loved, although somewhat highly-strung, Fr. Knolly Knox (aka Knox By Night) who, as Dean of the First Form, used to give 'licks' with a cane of hibiscus (presumably chosen for its tensile properties). You quickly learned not to mess with The Holy Ghost Fathers, especially the Principal, Fr. Arthur Lai Fook (aka Jap), and it was a brave student who responded to the request by Fr. Pedro Valdez to define the dyne by answering, "Fah, it what happen after living". Fr. Pedro was a gentle soul although his brother, Fr. Toba, who taught me Latin, would lob a blackboard eraser with reproducible inaccuracy at any student who had the temerity to doze off during the Second Punic War while Hannibal and his elephants were steamrollering the hapless legions of Gaius Flaminius into Lake Trasimene. At least we didn't have detention at my school. Actually we did have detention only it was called 'penance'. Each and every student also had a Judgement Book in which was entered a mark (out of 10) for each subject each and every week. A mark of 5 (or less) or a failure to return one's Judgement Book, duly signed by parent or guardian, by Wednesday morning earned the transgressor a corrective package of Licks and Penance. As a thoughtful child, I managed to shield my parents from this irksome bureaucracy and, in any case, it was simply safer that The Holy Ghost Fathers were never given the opportunity to familiarize themselves with the authentic parental signatures.

I used to think that 'Virtus et Scientia' was Latin for 'Licks and Penance' (17-Feb-2018 update)

What we learned from the Holy Ghost Fathers was that most physical quantities have dimensions and if the quantities on the opposite sides of the 'equal sign' in an equation have different dimensions then it is a sign of an unforced error rather than a penetrating insight. For example the dimensions of force are MLT^-2(M = mass; L = length; T = time) and you are free to express forces in newtons, dynes or poundals as you prefer. You can think of a physical quantity as a number multiplied by a unit and, without the unit, the number is meaningless. Units are extremely important but at the same time they are arbitrary in the sense that if your physical insight changes when you change a unit then it is neither physical nor an insight. Here's a good illustration of why dimensional analysis matters.

I have blogged ( 1 | 2 | 3 ) about how building the a concentration unit into the definition of ligand efficiency (LE) results in a metric that is physically meaningless (even though it remains a useful instrument of propaganda) and, for the masochists among you, there's also the LE metric critique in JCAMD. The problem can be linked to a lack of recognition of the fact that logarithms can only be calculated for numbers (which lack units). However, LE has another 'units issue' which is connected with the fact that it is a molar energy that is scaled in the definition of LE rather than pIC₅₀ or pK_d. This needn't be an issue but, unfortunately, it is. LE is defined by dividing a molar energy by the number of non-hydrogen atoms in the molecular structure and there is nothing in the definition of LE that says that the energy has to be expressed in any particular unit. This means that you can define LE using any energy unit that you want to. Some 'experts' appear to believe that dividing a molar energy by number of non-hydrogen atoms relieves them of the responsibility to report units. I'm referring, of course, to the practise of multiplying pIC₅₀ or pK_d by 1.37 when calculating LE. You might ask why people do this, especially given that 'experts' tout the simplicity of LE and they don't multiply pIC₅₀ or pK_d by 1.37 when they calculate LipE/LLE. Don't ask me because I'm neither expert nor 'expert'.

Let's take a look at this NRDD article on LE metrics and I'd like you to go straight to Box 1 (Ligand efficiency metrics). Six numbered equations are shown in Box 1 and it is stated towards the end of the first paragraph that "each equation corresponds to a mathematically valid function". This statement is incorrect because the first equation (1) in Box 1 is not a mathematically valid function. The reason for this is that the logarithm function cannot take as its argument a quantity, such as K_d, that has units. Equation (5), which defines LLE_AT,is mathematically valid although it differs from the mathematically ambiguous equation that was originally used to define LLE_AT.

To be honest, I think that Box 1 is probably beyond repair by conventional erratum and I'll back this opinion with an example:

"Assuming standard conditions of aqueous solution at 300K, neutral pH and remaining concentrations of 1M,
–2.303RTlog(K_d/C°) approximates to –1.37 × log(K_d) kcal/mol."

At my school in Trinidad this would have been called a 'ratch' and, once detected, it would have earned its perpetrator a corrective package of Licks and Penance. I don't think even the Holy Ghost Fathers could have exorcised a concentration unit quite this efficiently.

In some physical chemistry literature, K_dis defined as a dimensionless quantity by including C° in the definition of K_d. However, in the literature of biochemistry, biophysics and medicinal chemistry, K_d is usually quoted in units of concentration. Binding free energy has the same value and same dependence on C° regardless of which of the two conventions is used to define K_d .

(Update 17-Feb-2018)

I'd now like to talk a bit about the 'p' operator that we use to transform IC₅₀ and K_d values into logarithms. This makes it much easier to perceive structure-activity relationships and provides a better representation of measurement precision than when the IC₅₀ and K_d values themselves are used. To calculate pK_d,, first express K_din molar concentration units, dump the units and calculate minus the logarithm of the number. I realize that this may come across as arm waving but the process of converting K_d, to pK_d, can actually be expressed exactly in mathematical terms as follows:

pK_d = –log₁₀(K_d/M)

The 'p' operator has a 1 M concentration built into it. Although this choice of unit is arbitrary, it doesn't cause any problems if you're doing sensible things (e.g. subtracting them from each other) with the pK_d values. If, however, you're doing silly things (e.g. dividing them by numbers of non-hydrogen atoms) with the pK_d values then the plot starts to unravel faster than you can say 'Brexit means Brexit'.

I'd like you take a look at another article which also has a Box 1 although I won't bother you with another tiresome 'spot the errors' quiz. The equation that I'll focus on is:

pK_d = pK_{H +} pK_S

This equation describes the decomposition of affinity into enthalpic and entropic contributions and you might think this means that you can write:

K_d = K_H×K_S

As Prof. Pauli would have observed, this is an error in the 'not even wrong' category and it is clear that a difference in opinion as to the importance of units was as much responsible for the unraveling of the Austro-Hungarian empire as that unfortunate wrong turn in pre-SatNav Sarajevo. The 'p' operator implies that each of K_d, K_Hand K_Shas units of concentration. However, multiplying two such quantities will give a quantity that has units of concentration squared.

It is actually possible to decompose K_d into enthalpic and entropic contributions a valid manner but you need to be thinking carefully about the meaning of the standard state. As noted previously DG° depends on the concentration used to define the standard state. This is a consequence of the dependence of DS° on the standard concentration and DH is independent of the standard concentration (the standard state is assumed to be a dilute solution). This suggests defining K_Sas quantity with units of concentration and K_Has a quantity without units.

This is probably a good point to wrap things up. My advice to all the authors of the featured NRDD and FMC articles is that they read (and make sure that they understand) the section of this article that is entitled '8. Ligand Efficiency and Additivity Analysis of Binding Free Energy'. This advice is especially relevant for those of the authors who consider themselves to be experts in thermodyamics.

May I wish all readers a happy, successful and metric-free 2017.