1. ABOUT THE DATASET ------------ Title: Dataset supporting the article 'Pairwise additivity and three-body contributions for Density Functional Theory-based protein-ligand binding energies.' Creators: Mauricio Cafiero (ORCID: 0000-0002-4895-1783), Charlotte Schultze Organisation: University of Reading Rights-holders: University of Reading, Charlotte Schulze. Publication Year: 2023 Description: All component data for calculating Density Functional Theory-based protein-ligand binding energies using three different approaches: the total energy, a sum of pairwise energies, and a sum of pairwise and three-body energies. Data is also presented comparing three types of basis set superposition error corrections: no-corrections, global corrections, and local corrections. Timings are also given for each type of correction. Finally, a selection of four-body energy terms is given as an example. All calculations performed using Gaussian 16. Raw data for manuscript "Pairwise additivity and three-body contributions for Density Functional Theory-based protein-ligand binding energies" by Schulze and Cafiero. Cite as: Cafiero, Mauricio and Schulze, Charlotte (2023): Dataset supporting the article 'Pairwise additivity and three-body contributions for Density Functional Theory-based protein-ligand binding energies.' University of Reading. Dataset. https://doi.org//10.17864/1947.000511 Related publication: Schulze, C. A. E. and Cafiero, M. (2024) Pairwise additivity and three-body contributions for Density Functional Theory-based protein-ligand interaction energies. Journal of Physical Chemistry B, 128 (10). pp. 2326-2336. ISSN 1520-5207. https://doi.org/10.1021/acs.jpcb.3c07456 Contact: m.cafiero@reading.ac.uk 2. TERMS OF USE ------------ Copyright 2023 University of Reading, Charlotte Schulze. This dataset is licensed under a Creative Commons Attribution 4.0 International Licence: https://creativecommons.org/licenses/by/4.0/. 3. PROJECT AND FUNDING INFORMATION ------------ Title: A novel dynamic Density Functional Theory method for analysing multi-scale ligand/protein interactions Dates: Sept. 2022- August 2023 Funding organisation: The Royal Society of Chemistry Grant no.: Research Enablement Grant (E21-9051333819) Title: Computational drug design for Parkinson's Disease treatments Dates: July 2023 - September 2023 Funding organisation: DAAD RISE Worldwide Grant no.: GB-CH_ME-5660 4. CONTENTS ------------ File listing Pairwise-additivity-Cafiero-2023.xlsx This file contains all component data for calculating Density Functional Theory-based protein-ligand binding energies using three different approaches: the total energy, a sum of pairwise energies, and a sum of pairwise and three-body energies. Data is also presented comparing three types of basis set superposition error corrections: no-corrections, global corrections, and local corrections. Timings are also given for each type of correction. Finally, a selection of four-body energy terms is given as an example. All calculations performed using the Gaussian 16 software (www.Gaussian.com). Tab Contents S1. IEs Ligand-protein interaction energies (IEs) calculated in three ways with eighteen DFT methods. All values in kcal/mol. S2. Total IEs Ligand-protein interaction energies (IEs) calculated in three ways with eighteen DFT methods. All values in kcal/mol. S3. 2B + 3B terms Individual components for calculating the two and three body Interaction energies for eighteen DFT methods. PCM solvent=water. All values in kcal/mol. S4. PAIRS Individual components for calculating the two body Interaction energies (not including L-DOPA) needed for calculating the three-body interactions for eighteen DFT methods. PCM solvent=water. All values in kcal/mol. S5. Full BSSE Comparison of full counterpoise-corrections to basis set superposition errors (BSSE) to ‘local’ counterpoise-corrections to BSSE and no counterpoise corrections for BSSE for a GGA, meta-GGA and global hybrid meta-GGA DFT method. Basis set is aug-cc-pVDZ for BMK and tHCTH and 6-31G* for M06L. final column is the core-time for a full interaction energy calculation between L-DOPA and Phe142. Energy values in kcal/mol and time in minutes. S6. 4 BODY Individual components for calculating the four body Interaction energies including the three-body interactions needed for eighteen DFT methods. PCM solvent=water. All values in kcal/mol. Acronyms Variables PCM Polarizable Continuum Model GGA Generalized Gradient Approximation HF Hartree-Fock BP Binding pocket BSSE Basis set superposition error DFT Density Functional Theory 5. METHODS ----------- The methods below are adapted from the manuscript named above which has been submitted to review to the journal named above. All referenced figures and equations can be found in the manuscript. The binding site for the SULT1A3 enzyme was extracted from the crystal structure (PDB ID: 2A3R4) with dopamine bound. The binding site was defined as all amino acid residues with an atom within 3 angstroms of the bound ligand, and included Ala148, Asp86, Glu146, His149, His108, Lys106, Phe142, Phe24, Phe81, and Pro47 (see Figure 2). All residues were capped with an -OH or an -H in order to maintain the physiological charge. The bound ligand dopamine was modified into L-DOPA, and the structure was optimized using BMK/cc-pVDZ. In this optimization, the N-C(alpha)-C backbone of each residue was fixed in order to maintain the overall structure of the binding site from the crystal structure, and all other atoms were allowed to relax. The optimization included solvation by water using the polarizable continuum model (PCM). This optimized structure was used for all subsequent calculations. Several “families” of DFT methods were used in this study: HCTH, tHCTH, and tHCTHhyb, along with the related BMK functional; BLYP, B3LYP, CAM-B3LYP, and the empirical dispersion corrected CAM-B3LYP-D3; M06L, M06, M062X, and the empirical dispersion-corrected M062X-D3, along with the related MN12SX functional; PBE, PBE1PBE, LC-wHPBE, and the related TPSS functional. The SVWN functional and the Hartree Fock (HF) method were also tested for comparison. All energy calculations were performed with the aug-cc-pVDZ basis set, except the M06L calculations which were performed for comparison with the work of Ukisik et al, which were run with the 6-31G* basis set. The total interaction energy calculations (EQ. 1 in the manuscript) were performed with the eighteen DFT methods described above and the HF method. CP corrections were applied as in Figure 1a in the manuscript, wherein the energy calculations for the binding site included ghost atoms from the ligand, and the energy calculation for the ligand included ghost atoms from the entire binding site. The two body energy calculations (EQ. 2) were performed with the eighteen DFT methods described above and the HF method. Global CP corrections were applied as in Figure 1b for three sample series (BMK/aug-cc-pVDZ, tHCTH/aug-cc-pVDZ, and M06L/6-31G*), wherein energy calculations for the ligand included ghost atoms from the entire binding site, and energy calculations for each amino acid residue included ghost atoms on the ligand and the other 9 residues. Local CP corrections were applied as in Figure 1c for all eighteen DFT methods and HF, wherein energy calculations for the ligand included ghost atoms from the i-th amino acid residue only, and energy calculations for the i-th amino acid residue included ghost atoms on the ligand only. Ten two body terms were calculated per DFT method. The three-body energy calculations (EQ. 3) were performed with the eighteen DFT methods described above and the HF method. Local CP corrections were applied as in Figure 1d for all DFT methods, wherein energy calculations for the ligand included ghost atoms from the i-th and j-th amino acid residues, the energy calculations for the i-th amino acid residue included ghost atoms on the ligand and j-th residue, and the energy calculations for the j-th amino acid residue included ghost atoms on the ligand and i-th residue. The three body calculations included 45 three-body energies and 45 two body energies (none including the ligand) per DFT method. The four-body energy calculations (EQ. 3) were performed with BMK/aug-cc-pVDZ. Local CP corrections were applied in analogy to the three body calculations shown in Figure 1d, wherein energy calculations for the ligand included ghost atoms from the i-th, j-th and k-th amino acid residues, the energy calculations for the i-th amino acid residue included ghost atoms on the ligand and j-th and k-th residues, the energy calculations for the j-th amino acid residue included ghost atoms on the ligand and i-th and k-th residues, and the energy calculations for the k-th amino acid residue included ghost atoms on the ligand and i-th and j-th residues. Only two sample four-body terms have been calculated as examples. All calculations above were performed with the Gaussian 16 software.