1. ABOUT THE DATASET
------------

Title: The Reading Everyday Emotion Database (REED): version 2.0

Creator(s): Jia Hoong Ong, Florence Yik Nam Leung, & Fang Liu

Organisation(s): School of Psychology and Clinical Language Sciences, University of Reading

Rights-holder(s): University of Reading

Publication Year: 2022

Description: 

We developed a set of audio-visual recordings of emotions called the Reading Everyday Emotion Database (REED). Twenty-two native British English adults (12 females + 10 males) from a diverse age range and with drama/acting experience recruited from the university/community/social media were recorded producing utterances of various lengths in spoken and sung conditions in 13 various emotions (neutral, the 6 basic emotions, and 6 complex emotions) using everyday recording devices (e.g., laptops, mobile phones, etc.). All the recordings were validated by a separate, independent group of raters (n = 168 adults, recruited via Prolific).

Cite as: Ong, J.H., Leung, F.Y.N., & Liu, F. (2022): The Reading Everyday Emotion Database (REED): version 2.0. University of Reading. Dataset. https://doi.org/10.17864/1947.000407

Related publication: Ong, J.H., Leung, F.Y.N., & Liu, F. (In prep.). The Reading Everyday Emotion Database (REED): A set of audio-visual recordings of emotions in music and language. 


2. TERMS OF USE
-----------------

Copyright University of Reading 2022. 

The complete REED database is available to authorised users subject to a Data Access Agreement between the University of Reading and a recipient organisation.  A copy of the University of Reading Data Access Agreement is included with this item. 

To request access to the database, please complete a data access request at https://redcap.link/data-request.

A subset of example clips from the database is made available for use under a Creative Commons Attribution-NonCommercial 4.0 International Licence (https://creativecommons.org/licenses/by-nc/4.0/). Those clips are listed in the 'example clips' folder and only those clips should be used for publication and presentation purposes.


3. PROJECT AND FUNDING INFORMATION
------------

Title: Cracking the Pitch Code in Music and Language: Insights from Congenital Amusia and Autism Spectrum Disorders

Dates: 01-12-2016 - 31-05-2023

Funding organisation: European Research Council (ERC)

Grant no.: Starting Grant 678733 (CAASD)


4. CONTENTS
------------
File listing:

(i) REED_validation_summary.csv -- Data for the validation task
(ii) UoR-DataAccessAgreement-000407.pdf -- The REED Data Access Agreement
(iii) example_clips.zip -- Example clips of the REED


Not available in current dataset:
(i) The REED (will be sent to user once they have signed the Data Access Agreement)


4.1. Data for the validation task
--------------------------------------
The .csv file from the validation task as described in the Related Publication. Details of the columns are as the following:

- file = the recording file that was validated. The files are named in the following convention: domain_utterance_speaker_emotion.mp4. See Section 4.2 The REED for more details. 

- domain = whether the file was spoken ("speech") or sung ("song").

- utterance = whether the utterance produced was the syllable 'ah' ("ah"), the phrase 'Happy birthday to you' ("birthday"), or the sentence 'The music played on while they talked' ("music").

- speaker = the speaker's ID. FW = female; MW = male, followed by digits representing their speaker code. 

- emotion = the emotion (represented by the first three characters of the emotion) that was produced. See Section 4.2 The REED for a list of the emotions.

- item = the recorded emotion (represented by the first three characters of the emotion) followed by the token number. For example, 'hop03' refers to Hopeful Token 3. See Section 4.2 The REED for a list of the emotions.

- n = number of responses contributing to the validation of that item (e.g., 11 means 11 participants were presented with that item to be recognised and rated)

- mean_score = mean proportion correct (ranging from 0-1)
- sd_score = standard deviation of proportion correct

- mean_intense = mean intensity rating (on a 5-point scale from 1= Not at all intense; 5= Completely intense)
- sd_intense = standard deviation of intensity rating

- mean_genuine = mean genuineness rating (on a 5-point scale from 1= Not at all genuine; 5= Completely genuine)
- sd_genuine = standard deviation of genuineness rating


4.2. The REED
--------------------------------------
The database contains 3230 audio-visual .mp4 files, organised in four subfolders, each corresponding to a recording condition: 
(i) spoken "ah" ('spoken_ah' subfolder)
(ii) spoken "Happy birthday to you" ('spoken_bday' subfolder)
(iii) spoken "The music played on while they talked" ('spoken_music' subfolder)
(iv) sung "Happy birthday to you" ('sung_bday' subfolder)

The files are named in the following convention: domain_utterance_speaker_emotion.mp4

Domain: 
sp = speech
so = song

Utterance:
ah = "ah"
bd = "Happy birthday to you"
mu = "The music played on while they talked"

Speaker:
FW = female
MW = male
Digits = speaker code

Emotion:
ang = angry
dis = disgust
emb = embarrassed
fea = fearful
hap = happy
hop = hopeful
jea = jealous
neu = neutral
pro = proud
sad = sad
sar = sarcastic
str = stressed
sur = surprised


5. METHODS
--------------------------

For a detailed description of the methodology, please refer to the publication below:

Ong, J.H., Leung, F.Y.N., & Liu, F. (In prep.). The Reading Everyday Emotion Database (REED): A set of audio-visual recordings of emotions in music and language.


6. CHANGELOG 
--------------------------

Version 2 differs from Version 1 (https://doi.org/10.17864/1947.000336) in the following ways:
- Whereas Version 1 only had clips rated above chance, version 2 consists of all the recorded clips (3230 in total).
- Validation data on Version 1 was based on a 6-forced choice recognition task (i.e., participants were presented with a subset of six labels to be presented as options), whereas the validation for Version 2 (on a completely new set of participants) was based on a 13-forced choice recognition task (i.e., participants were presented with all the 13 labels as options).
- Version 2's data_validation.csv does not contain the foil words presented, and does not have the columns "correct" and "selected". Instead, Version 2's data_validation.csv contains the mean (and standard deviation) correct scores for each item, as well as the mean (and standard deviation) of the genuineness and intensity ratings.