How to cite this Dataset
Cafiero, Mauricio (2024): Dataset supporting the article 'Transformer-decoder GPT models for generating virtual screening libraries of HMGCR inhibitors: effects of temperature, prompt-length and transfer-learning strategies'. University of Reading. Dataset. https://doi.org/10.17864/1947.001340
Description
Raw data for virtual screeing libraries generated by a generative, pre-trained transformer-decoder model. Models were pre-trained on a general drug database from ZINC15, and fine-tuned on inhibitors of HMGCR from ChEMBL. Libraries used different transfer-learning strategies, different prompt-lengths and different temperatures. The resultant libraries were screened against a deep neural network trained on experimental HMGCR IC50 values to predict IC50 values, docking scores from Autodock Vina, quantitative estimate of drug-likeness, Tanimoto similarity to known statin drugs, and other properties. This dataset contains tables of properties as well as CSV files with the generated libraries, a TKinter-based GUI to interacting with the library, and docking poses for selected molecules.
Resource Type: | Dataset |
---|---|
Creators: | Cafiero, Mauricio ORCID: https://orcid.org/0000-0002-4895-1783 |
Rights-holders: | University of Reading |
Data Publisher: | University of Reading |
Publication Year: | 2024 |
Data last accessed: | 19 January 2025 |
DOI: | https://doi.org/10.17864/1947.001340 |
Metadata Record URL: | https://researchdata.reading.ac.uk/id/eprint/1340 |
Organisational units: | Life Sciences > School of Chemistry, Food and Pharmacy > Department of Chemistry |
Participating Organisations: | University of Reading |
Keywords: | GPT, machine learning, drug design |
Rights: | |
Data Availability: | OPEN |