University of Reading Research Data Archive

Dataset supporting the article 'Transformer-decoder GPT models for generating virtual screening libraries of HMGCR inhibitors: effects of temperature, prompt-length and transfer-learning strategies'

How to cite this Dataset

Description

Raw data for virtual screeing libraries generated by a generative, pre-trained transformer-decoder model. Models were pre-trained on a general drug database from ZINC15, and fine-tuned on inhibitors of HMGCR from ChEMBL. Libraries used different transfer-learning strategies, different prompt-lengths and different temperatures. The resultant libraries were screened against a deep neural network trained on experimental HMGCR IC50 values to predict IC50 values, docking scores from Autodock Vina, quantitative estimate of drug-likeness, Tanimoto similarity to known statin drugs, and other properties. This dataset contains tables of properties as well as CSV files with the generated libraries, a TKinter-based GUI to interacting with the library, and docking poses for selected molecules.

Resource Type: Dataset
Creators: Cafiero, Mauricio ORCID logoORCID: https://orcid.org/0000-0002-4895-1783
Rights-holders: University of Reading
Data Publisher: University of Reading
Publication Year: 2024
Data last accessed: 7 September 2024
DOI: https://doi.org/10.17864/1947.001340
Metadata Record URL: https://researchdata.reading.ac.uk/id/eprint/1340
Organisational units: Life Sciences > School of Chemistry, Food and Pharmacy > Department of Chemistry
Participating Organisations: University of Reading
Keywords: GPT, machine learning, drug design
Rights:
Data Availability: OPEN

Files

Download all (.zip)

Data

README file

Statistics

Altmetric

Actions (Log-in required)

View item View item