1. ABOUT THE DATASET ------------ Title: Dataset to support 'Rapid Multiplex Antimicrobial Resistance Profiling and Bacterial Identification by LAP-MALDI MS Biotyping' Creator(s): Lily R. Adair (https://orcid.org/0009-0001-2652-9960), Rainer Cramer (https://orcid.org/0000-0002-8037-2511). Organisation(s): University of Reading Rights-holder(s): University of Reading Publication Year: 2025 Description: This dataset contains raw and processed liquid atmospheric pressure matrix-assisted laser desorption/ionisation (LAP-MALDI) mass spectrometry data from thirteen bacterial species, five of which are Gram-positive. For two species, Klebsiella pneumoniae and Escherichia coli, antibiotic-susceptible and antibiotic-resistant strains were analysed. The LAP-MALDI MS(/MS) workflow enabled antimicrobial resistance profiling, bacterial classification, and further characterisation via MS/MS protein sequencing. Raw MS/MS data obtained from proteoform fragmentation were processed using Mascot Distiller (version 2.8.5.1, 64-bit; Matrix Science, London, UK). The resulting protein fragment ion peak lists were submitted to the Mascot MS/MS Ions Search tool (version 3.1; Matrix Science) for protein characterisation and species identification. The dataset also includes machine learning analyses using principal component analysis (PCA), linear discriminant analysis (LDA) and cross-validation. All MS and MS/MS data were acquired using a Synapt G2-Si Q-TOF mass spectrometer coupled with a custom-built AP-MALDI source. Changelog: N/A Cite as: Adair, Lily Rose and Cramer, Rainer (2025): Dataset to support 'Rapid Multiplex Antimicrobial Resistance Profiling and Bacterial Identification by LAP-MALDI MS Biotyping'. Reading. Dataset. ttps://doi.org/10.17864/1947.000468 Related publication: Adair, L. R., Iyer, S., Jones, I. M., and Cramer, R. Rapid Multiplex Antimicrobial Resistance Profiling and Bacterial Identification by LAP-MALDI MS Biotyping. [Insert journal here]. Submitted for review. Contact: Prof. Rainer Cramer, Department of Chemistry, University of Reading, Reading RG6 6DX, United Kingdom, Tel: 0118 378 4550, Email: r.k.cramer@reading.ac.uk Acknowledgements: We kindly thank Waters Corporation for access to the AMX [Beta] software and Sophie Lellman for initial support. This research was supported by the Engineering and Physical Sciences Research Council (EPSRC) through grant EP/V047485/1. 2. TERMS OF USE ------------ Copyright 2025 University of Reading. This dataset is licensed under a Creative Commons Attribution 4.0 International Licence: https://creativecommons.org/licenses/by/4.0/. 3. PROJECT AND FUNDING INFORMATION ------------ Title: A Cost-Effective High-Speed Clinical Diagnostics Instrument for Large Population Screening Based on Novel Liquid AP-MALDI MS Technology Dates: 2021-2025 Funding organisation: Engineering and Physical Sciences Research Council Grant no.: EP/V047485/1 Title: Advancing LAP-MALDI mass spectrometry profiling/biotyping for the analysis of microbes and their pathogenicity Dates: 2022-2025 Funding organisation: University of Reading 4. CONTENTS ------------ Raw_Data.zip Cross_Validation_Reports.zip Processed_Peak_Lists.zip Mascot_Search_Results.zip Resistance_Scores.zip Raw_Data.zip contains two main folders: 'Antibiotic_susceptibility_assay' and 'Classification_susceptible_only'. The 'Antibiotic_susceptibility_assay' folder is divided into eight subfolders, comprising two susceptible strains, five resistant strains, and one blank antibiotic control. Each strain folder includes two subfolders, 'TCA' and 'Antibiotic_incubation_3HR', each containing 15 files corresponding to five biological replicates, each with three technical replicates. Additionally, the two susceptible strains contain a 'Penicillinase' folder, each with three biological replicates, each in turn with three technical replicates (nine files total per 'Penicillinase' folder). The 'Escherichia_coli_(IMP-1)' folder includes an additional folder titled 'Antibiotic_other_timepoints', which contains two files. The 'Klebsiella_pneumoniae_(VIM-1)' folder contains an 'MSMS_Proteins' folder, which includes raw MS/MS data for two proteins, organised into subfolders '7700_Da' (4 files) and '9470_Da' (5 files). The 'Classification_susceptible_only' folder contains data for 14 bacterial strains, each with three biological replicates and three technical replicates, totalling nine files per strain. Cross_Validation_Reports.zip contains three subfolders. The 'Lipids_and_Proteins_(incubated_and_non-incubated)' folder includes six Excel files and six PDF reports. The 'Lipids_and_Proteins_(non-incubated_only)' folder includes one Excel file and one PDF report. The 'Lipids_Only_(incubated_and_non-incubated)' folder also contains six Excel files and six PDF reports. Processed_Peak_Lists.zip contains two .txt files, each representing the processed peak list for one of the identified proteins. Mascot_Search_Results.zip contains two PDF files for each protein, providing the Mascot search outputs. Resistance_Scores.zip contains a single Excel file detailing the resistance score calculations. 5. METHODS ----------- The LAP-MALDI MS and MS/MS data were generated using a commercial mass spectrometer, a Synapt G2-Si (Waters Corporation), equipped with a custom-built AP-MALDI source. MassLynx (ver. 4.2; Waters) software was used to acquire and process the data (.raw data folders). Further processing was carried out using Mascot Distiller (ver. 2.8.5.1; Matrix Science). Statistical analysis was performed using the AMX Abstract Model Builder [Beta] v1.0.2259.0 (Waters) to evaluate combined lipid and antibiotic resistance profiles. Spectra from all scans were merged into 231 representative spectra (9 per species, except E. coli with 48 and K. pneumoniae with 84). Data were binned at 1 Da intervals across the range of 500–2000 Da and analysed by PCA (35 dimensions) and LDA (11 dimensions) after background subtraction and normalisation. Cross-validation used the built-in 20% out method, and outliers were identified at 4–9 SD thresholds. Protein database searches were conducted using Mascot MS/MS Ions Search tool (version 3.1; Matrix Science) against the Mascot contaminants database (20 Jan 2025; 247 sequences; 128,130 residues) and the trEMBL database (22 May 2024; 248,234,451 sequences; 87,367,689,973 residues).