# 1. ABOUT THE DATASET
------------

Title:	University of Reading Open Research Survey 2021 dataset

Creator(s): Daniel Brady [1], Peter Bray [2], Auvikki de Boon [3], Marcello De Maria [3], Kirsty Hodgson [1], Sophie Read [3], and Brendan Williams [1,4].

Organisation(s): 1. School of Psychology and Clinical Language Sciences.
                 2. School of Archaeology, Geography and Environmental Science.
                 3. School of Agriculture, Policy and Development.
                 4. Centre for Integrative Neuroscience and Neurodynamics.

Rights-holder(s): University of Reading, Auvikki de Boon, Kirsty Hodgson, Sophie Read, and Brendan Williams.

Publication Year: 2022

Description: 
This dataset contains anonymised data collected during the University of Reading Open Research Survey 2021. This project was lead by a group of Open Research Champions across multiple departments, with the aim of mapping the current open research landscape of the university.

Questionnaire responses were collected from 403 staff and students in the University of Reading community between October and November 2021. The data shared here contains anonymised responses from 390 participants, following cleaning of the dataset to remove duplicates and participants who did not provide consent for data usage and/or sharing. 

Participants were recruited using departmental mailing lists, and through word of mouth. Dissemination across the institution were supported by Open Research Champions within their respective departments. Three 50 GBP prizes were offered to respondees to incentivise participation in the survey.

The dataset contains anonymised survey data for individual respondees, a data dictionary for interpreting values in the dataset, a copy of the original survey as implimented in REDCap, and a Jupyter Notebook used to generated the sharable data from our raw dataset. 

Cite as: Brady, Daniel, Bray, Peter, de Boon, Auvikki, De Maria, Marcello, Hodgson, Kirsty, Read, Sophie and Williams, Brendan (2022): University of Reading Open Research Survey 2021 Dataset. University of Reading. Dataset. https://doi.org/10.17864/1947.000355.

Contact: b.williams3@reading.ac.uk

# 2. TERMS OF USE
-----------------

Copyright 2022 University of Reading, Auvikki de Boon, Kirsty Hodgson, Sophie Read, and Brendan Williams. All documentation and code are licensed by the rights-holder under a Creative Commons Attribution 4.0 International (CC BY 4.0) license (https://creativecommons.org/licenses/by/4.0/).

# 3. PROJECT AND FUNDING INFORMATION
------------

Title: Survey on Open Research at the University of Reading.

Dates: April 2021 - April 2023

Funding organisation: Open Research Champions Scheme, University of Reading.

# 4. CONTENTS
------------
## data_cleaning.ipynb:
Interactive Jupyter Notebook used to modify raw data received from REDCap, and merge manually anonymised qualitative data. String data are converted to integer format for data processing, age is binned into groups for anonymisation, length of tenure is removed for anonymisation, duplicate cases were removed (IDs: 44, 116, 149, 272, 355), and participants who did not give consent were also removed. This was then saved as a single csv file (data_share.csv) that is available within this data archive.

## Data_dictionary.xlsx:
Excel spreadsheet that enables matching of data in data_share.csv to the questionnaire presented in OR_Survey_Questionnaire.pdf. This spreadsheet contains five columns. 'Question number' gives the item number for the question used in the questionnaire. 'Raw Data' gives the question as it is presented in the questionnaire. 'Preprocessed data' gives the column name used to record participant responses in data_share.csv. 'Scoring' matches the numeric values used to record participant responses for that item with the available options given in the questionnaire. 'Notes' gives any additional information about an item not otherwise recorded in the dictionary. 

## data_share.csv:
Anonymised survey data post-cleaning. This includes removal of duplicate records, anonymisation of qualitative responses, and removal of participants who did not give consent. Further details of data cleaning can be found in the description of data_cleaning.ipynb. A description of the dictionary needed to interpret this data is found under Data_dictionary.xlsx.

## OR_Survey_Questionnaire.pdf:
Open research survey with attribution, license, and description. This also includes the participant information sheet that was given to respondents of the survey. The items included in the survey here can be matched to participant data in data_share.csv using the Data_dictionary.xlsx file.  

# 5. METHODS
--------------------------
Quantitative data were processed by BW. Information on the processing of raw data can be found in the Data_dictionary.xlsx.

Qualitative data were anonymised by AdB, KH, and BW, following the recommendations on qualitative data anonymisation made by Braun and Clarke (2013), and Saunders et al., (2015). Full details on the anonymisation protocol can be found below. 

The script used for data processing was created by BW, and can be found in data_cleaning.ipynb

The survey was initially developed jointly by DB, PB, AdB, MDM, KH, SR, ER, and BW.

## Protocol of the Anonymisation of Qualitative Survey Data

### Aim
To adopt a transparent, reproducible and rigorous approach to the screening and anonymisation of the qualitative survey data in order to protect the identification of participants but preserve the integrity of the responses and communication of salient themes.

### Methods
This protocol was informed by the recommendations on qualitative data anonymisation made by Braun and Clarke (2013), and Saunders et al., (2015).

Qualitative responses were screened by three researchers (ADB, KH, BW) in order to identify specific anonymisation criteria specific to the sample. Further collaborative discussion of each identified anonymity concern was addressed through consultation between each of the researchers in order to balance confidentiality with the preservation of content and themes. 

The anonymity criteria identified and details of how each of these were addressed are as follows: 

1.	People's names
In all cases these were substituted with their generic title e.g., master's student, supervisor, Dr, Professor. No pseudonyms were used.
2.	Locations and specialised institutional departments
Referenced to broad departments and the University of Reading were retained, specialised small sub-departmental structures were removed to preserve the anonymity of the survey respondents.
3.	Specific projects and grants
Named specific projects and grants were either removed or neutralised as 'project' or 'grant'.
4.	Specialised occupations
In some cases, specialised organisational titles were genericised e.g., 'Organisational Lead for OR' in order to preserve salient meaning, but protect the anonymity of any specific individual.
5.	Occupational relationships
Where possible the relationships described (e.g., student and supervisor, junior colleague) were preserved with the anonymisation of all identifiable details.
6.	Further identifiable information
This included specialist research interests or methodologies unique to individual researchers.

Errors related to sentence structure, and omission errors were not corrected in this process. However, definitions of colloquial terms and abbreviations have been itemised and may be found in the appendices of this protocol.

### Outcome
Three versions of the qualitative survey data have been developed:
1.	An original un-anonymised version.  
2.	An unmarked transcription screened and edited for anonymity that may be published in an open data repository. 
3.	A code to identify all redacted and amended anonymisation will be created for procedural transparency, thereby providing a marked generic transcription.
All data are stored in password-protected secure electronic files.

### Recommendations
Due to the size of the sample in this study, cumulative effects (I.e., the consideration of identifiable information across all an individual's responses) were not accounted for as there are no individual-case analyses planned in this study. However, this protocol recommends further anonymisation if the analysis plan were to change.

### References
Clarke, V., & Braun, V. (2013). Successful Qualitative Research: A Practical Guide for Beginners. Sage Publications.
Saunders, B., Kitzinger, J., & Kitzinger, C. (2015). Anonymising interview data: challenges and compromise in practice. Qualitative Research, 15(5), 616-632. https://doi.org/10.1177/1468794114550439

This protocol is based on https://dx.doi.org/10.1177%2F1468794114550439

### Definitions of colloquial terms 
*Please note, colloquial terms that end with a question mark are our best guesses at what the responder meant when using an acronym*
CentAUR - Central Archive at the University of Reading. University of Reading publication archive
CIF - Crystallographic Information File
CINN - Centre for Integrative Neuroscience and Neurodynamics
CORRI - Committee on Open Research and Research Integrity 
CO-I - co-investigator
CPD - continued professional development
CT - computerised tomography 
CYLC - https://cylc.github.io/ ?
ECR - early career research
EEG - electroencephalography
ELN - electronic lab notebooks
ESDM - exegesis spatial data management
FAIR - FAIR principles. Findable accessible, interoperable, reusable
FDG - Focus Group Discussion 
GDPR - general data protection regulation
GIS - Geographic Information System 
MRI - magnetic resonance imaging
MRS -  magnetic resonance spectroscopy
NCAS-CMS - National Centre for Atmospheric Science-Computational Modelling Sciences
OA - open access
OR - open research
OSF - Open Science Framework
PCLS/SPCLS - School of Psychology and Clinical Language Sciences
PGR - postgraduate research student
PI - principle investigator
Pre-reg - pre-registration 
RCT - randomised control trial
REF - research excellence framework
RCUK - UK research councils ?
RDM - Research data management ?
ROSE - In computing, a rose tree is a term for the value of a tree data structure with a variable and unbounded number of branches per node. ?
RRDP - Reading researcher development programme. Training scheme provided by the Graduate school at the university of Reading for doctoral researchers
SEM - structural equation modelling
STS - science and technology studies 
UG - Undergraduate
UKCORR - UK Council of Open Research and Repositories
UKRI - UK Research and Innovation
UoR - University of Reading
XIOS - Extensible Markup Language Internet Operating System