1. ABOUT THE DATASET ------------ Title: Dataset supporting the article 'Benchmark CCSD(T) and density functional theory calculations of biologically relevant catecholic systems.' Creator(s): Mauricio Cafiero (ORCID: 0000-0002-4895-1783), Joshua Harle Organisation(s): University of Reading Rights-holder(s): University of Reading, Joshua Harle (Student). Publication Year: 2024 Description: Complexation energies for complexes of four catechols (catechol, dinitrocatechol, dopamine and DOPAC) with eight counter-molecules: Mg(EDA)2(H2O)^2+, Zn(EDA)2(H2O)^2+, methyl amine, methanol, benzene, indole, isobutane and methane thiol. Each complex is optimized with the CCSD/cc-pVDZ method unless noted. The exceptions are optimized with MP2/cc-pVDZ. Approximate complete basis set (CBS) CCSD(T) complexation energies are calculated for each compound and serve as a reference. CBS MP2 and HF complexation energies are also calculated. These benchmark values are then compared against several series of DFT methods to evaluate the effects of exact exchange and empirical dispersion on the functional's ability to replicate the CCSD(T) energies. DFT methods evaluated include: SVWN, M06L, M06, M062X, M06-2X-D3, MN15, MN12SX, BLYP, B3LYP, CAM-B3LYP, CAM-B3LYP-D3, B97XD, wB97D, PBE, LC-wHPBE, HCTH, tHCTHhyb,and BMK. All calculations performed using Gaussian 16. Raw data for manuscript "Benchmark CCSD(T) and density functional theory calculations of biologically relevant catecholic systems.," by Harle and Cafiero. Cite as: Cafiero, Mauricio and Harle, Joshua (2023): Dataset supporting the article 'Benchmark CCSD(T) and density functional theory calculations of biologically relevant catecholic systems.' University of Reading. Dataset. https://doi.org/10.17864/1947.000532. Related publication: J. Harle and M. Cafiero, "Benchmark CCSD(T) and density functional theory calculations of biologically relevant catecholic systems." Submitted, Physical Chemistry Chemical Physics (an RSC journal). Contact: m.cafiero@reading.ac.uk 2. TERMS OF USE ------------ Copyright 2024 University of Reading, Joshua Harle. This dataset is licensed under a Creative Commons Attribution 4.0 International Licence: https://creativecommons.org/licenses/by/4.0/. 3. PROJECT AND FUNDING INFORMATION ------------ Title: A novel dynamic Density Functional Theory method for analysing multi-scale ligand/protein interactions Dates: Sept. 2022- August 2023 Funding organisation: The Royal Society of Chemistry Grant no.: Research Enablement Grant (E21-9051333819) 4. CONTENTS ------------ File listing SupportDataDFTBenchmarkCafiero24.xlsx This file contains complexation energies for complexes of four catechols (catechol, dinitrocatechol, dopamine and DOPAC) with eight counter-molecules: Mg(EDA)2(H2O)^2+, Zn(EDA)2(H2O)^2+, methyl amine, methanol, benzene, indole, isobutane and methane thiol. Each complex is optimized with the CCSD/cc-pVDZ method unless noted. The exceptions are optimized with MP2/cc-pVDZ. Approximate complete basis set (CBS) CCSD(T) complexation energies are calculated for each compound and serve as a reference. CBS MP2 and HF complexation energies are also calculated. These benchmark values are then compared against several series of DFT methods to evaluate the effects of exact exchange and empirical dispersion on the functional's ability to replicate the CCSD(T) energies. DFT methods evaluated include: SVWN, M06L, M06, M062X, M06-2X-D3, MN15, MN12SX, BLYP, B3LYP, CAM-B3LYP, CAM-B3LYP-D3, B97XD, wB97D, PBE, LC-wHPBE, HCTH, tHCTHhyb,and BMK. All calculations performed using Gaussian 16 (www.Gaussian.com). Tab Contents S1. Metal-ionic BSSE corrected aug-cc-pVTZ complexation energies between four catechols (catechol, dinitrocatechol, dopamine and DOPC) and two metal complexes: Zn2+ and Mg2+ coordinated to two ethylene diamine molecules and a water molecule. Energies are in kcal/mol. All calculations performed with Gaussian 16. S2. h-bond BSSE corrected cc-pVDZ complexation energies between four catechols (catechol, dinitrocatechol, dopamine and DOPC) and two molecules: methyl amine and methanol. Energies are in kcal/mol. All calculations performed with Gaussian 16. S3. pi-stacking BSSE corrected cc-pVDZ complexation energies between four catechols (catechol, dinitrocatechol, dopamine and DOPC) and two molecules: benzene and indole. Energies are in kcal/mol. All calculations performed with Gaussian 16. S4. Dispersion BSSE corrected cc-pVDZ complexation energies between four catechols (catechol, dinitrocatechol, dopamine and DOPC) and two molecules: isobutane and methane thiol. Energies are in kcal/mol. All calculations performed with Gaussian 16. S5. CCSD(T) Complete basis-set extrpolated CCSD(T), MP2 an HF complexation energies for thirty two complexes of catechols and various other compounds. S6. DFT Summary Average complexation energies for all DFT methods across each different type of complex and across all complexes. Acronyms Variables CCSD Coupled Cluster Singles and Doubles MP2 Moller-Plessett 2nd order perturbation theory HF Hartree-Fock DNC Dinitrocatechol DOPAC 3,4-Dihydroxyphenylacetic acid 5. METHODS ----------- The methods below are adapted from the manuscript named above which has been submitted to review to the journal names above. Thirty-two molecular complexes have been designed to mimic the types of interactions found between dopamine and the active sites of eight enzymes important in drug design for Parkinson’s Disease. These thirty-two complexes consist of four catecholic molecules (catechol, dinitrocatechol, dopamine and DOPAC) each interacting with 8 counter-molecules. The first eight model complexes (ionic) are the four deprotonated catechols bound to a Mg2+ ion in an octahedral complex and a Zn2+ ion in an octahedral complex (see Figure 1 of the manuscript). The deprotonated ligands carry a -1 charge (catechol and dinitrocatechol), a neutral charge (dopamine) and a -2 charge (DOPAC). These complexes are designed to mimic crucial interactions found between ligands and the active sites of catechol-o-methyltransferase and tyrosine hydroxylase. Ionic interactions are often the dominant interactions holding a ligand to an active site, as is the case with these two enzymes. The next eight complexes are models for hydrogen bonding. As stated above, 11.8% of the 127 total interactions between dopamine and the eight enzyme active sites are hydrogen-bonds with interaction energies between ~8 kcal/mol and ~15 kcal/mol each, and so capturing these interactions is important for accurate overall modelling. These complexes (see Figure 2 of the manuscript) consist of the four catechols hydrogen-bonded to methylamine and to methanol. The complexes with methylamine mimic interactions between the catechols and histidine, tryptophan, proline, glutamine and asparagine residues in the enzyme active sites, while the complexes with methanol mimic interactions with serine, tyrosine, glutamine and asparagine residues. The ligands are either neutral (catechol and dinitrocatechol), carry a +1 charge (dopamine), or carry a -1 charge (DOPAC). The next eight complexes are models for pi-stacking (see Figure 3 of the manuscript). As stated above, pi-stacking accounts for almost 25% of the interactions between dopamine and the active sites of the eight enzymes. The first four complexes are the four catechols stacked with benzene, to mimic the pi-stacking with phenylalanine and tyrosine residues in the enzyme active sites, while the next four complexes are the four catechols stacked with indole, to mimic the pi-stacking with tryptophan residues found in the enzyme active sites. The final eight complexes, shown in Figure 4 of the manuscript, are models for “other” weak interactions and consist of the four catechols interacting with isobutane and with methane thiol. The complexes with isobutane mimic the interactions the ligands have with alanine, valine, leucine and isoleucine residues in the enzyme active sites, while the complexes with methane thiol mimic interactions specifically with cysteine residues, and more broadly with any polar residues that do not form hydrogen bonds. These interactions account for almost 50% of the interactions between dopamine and the eight enzyme active sites, and so the accuracy of these complexes is crucial to the overall accuracy of the calculations. The thirty-two complexes described above were pre-optimized with M062X/6-31G, and then fully optimized using CCSD/cc-pVDZ or MP2/cc-pVDZ, as described in the results section of the manucript. Approximate complete basis set (CBS) CCSD(T) energies for the complexes described above, as well as their individual components, were calculated according to the expression by used by Grimme and coauthors in several works (see references in manuscript) and benchmarked for hydrogen-bonded complexes by Jurecka and Hobza (reference in manuscript) where Ecorr is the contribution to the total energy from correlation, and SB stands for a small basis set, which, in the current work, is cc-pVDZ. This basis set was used in the expression by Grimme and coauthors and was shown by Jurecka and Hobza to have good accuracy. The MP2 and HF CBS energies used in EQ. 10 and reported in Table 1 below were obtained with the formula by Halkier et al (reference in manuscript) where X and Y represent basis sets; in this case X = 3 for the cc-pVTZ basis set and Y = 4 for the cc-pvQZ basis set. Halkier et al. have reported that the values obtained with the TZ/QZ combination have a mean error of 1.3 kcal/mol and a maximum error of 3.25 kcal/mol for their sample calculations. CBS HF, MP2 and CCSD(T) interaction energies for the complexes were found using the expression below with no counterpoise corrections applied: Eint = Etotal - Emol1 - Emol2. All MP2, CCSD and CCSD(T) calculations used a frozen core. The interactions energies thirty-two complexes described above were calculated at the same geometries using nineteen DFT methods: B97D3, B97XD, M06L, M06, M062X, M062X-D3, MN12SX, MN15, BLYP, B3LYP, CAM-B3LYP, CAM-B3LYP-D3, HCTH, tau-HCTHhyb, BMK, PBE, omega-PBEhPBE, LC-omega-HPBE, and SVWN, all with the aug-cc-pVTZ basis set. The energies were calculated with EQ. 12 from the manuscript with counterpoise-corrections41 applied, meaning that in the calculation of each fragment molecule, the basis functions and DFT quadrature points from the opposite fragment were included. For both the Mg and Zn complexes with DOPAC, the counterpoise corrected fragment SCF for DOPAC did not converge for the M06L and MN12SX functionals, and so the interaction energies for those complexes with those functionals were calculated without counterpoise-correction and the average counterpoise-correction for the other Minnesota functionals was applied (+1 kcal/mol for the Mg complex, and +1.5 kcal/mol for the Zn complex). Basis set convergence for the DFT methods was tested on a subset of eight of the complexes studied. Four hydrogen-bonded complexes (catechol and dinitrocatechol with methyl amine and methanol) and four -stacking complexes (catechol and dinitrocatechol with benzene and indole) were chosen to represent systems where dipole-type interactions were dominant and where induction-type interactions were dominant. Interaction energy calculations for each of the eight complexes were rerun with the aug-cc-pVQZ and def2-QZVPP basis sets to evaluate the effects of going from a triple-zeta basis set to a quadruple zeta basis set. The same eight complexes used for the basis set tests were also used to test the effect of optimization with a DFT method on DFT-based energies, rather than using the same geometry for DFT calculations as that used for the CCSD(T) calculations. This allows for the possibility that the DFT method may find a different minimum than the CCSD optimizations and that structure may yield a more “accurate” energy. The hydrogen bonded complexes were optimized with CAM-B3LYP-D3/aug-cc-pvtz starting from the CCSD-optimized geometries in order to find the same relative minima. The pi-stacking complexes were optimized with M062X-D3/aug-cc-pVTZ starting from the CCSD-optimized geometries in order to find the same relative minima as well. Interaction energies were then computed with the two DFT methods and aug-cc-pVTZ.