Dataset Name: "Log Cabin Republicans 2013-2021 Twitter Database." Reference Information: Newton, Thomas (2024) "Log Cabin Republicans 2013-2021 Twitter Database", University of Reading Research Data Archive. Copyright 2024 Thomas Newton. This dataset is licensed under a Creative Commons Attribution 4.0 International Licence. DOI: 10.17864/1947.001361 ------ This dataset was used for Thomas Newton's PhD work (Student No: 22007843), from 2019-2024. The original study was a Critical Discourse Analysis of a sample of 1300 Tweets by the Log Cabin Republicans. The sample is made up of tweets from 7 accounts related to the organisation, of which 3 are the personal accounts of members of its leadership team over the period 2013-2021. The tweets had to have received at least 10 'likes' to be included, and must have been sent between January 20th 2013 (first full calendar day of Barack Obama's 2nd term), and January 19th 2021 (the last full calendar day of the Trump administration). The original research sought to demonstrate a rise in populist far right language, especially authoritarian language, by the group over time - thereby demonstrating an ideological change towards the far right. The tweets I gathered were hand-coded using NVivo, and the results stored in SPSS, and extracted into .txt and .xlsx format for storage, though researchers can make use of their own preferred programs to go back over the data. Readers should note that Tweet IDs can sometimes not be copied into an Excel sheet, their formatting changes to make them un-usable. This is why they have been stored on a separate .txt file. ------ Due to Twitter's terms of service, the text of whole tweets cannot be shared to a database, but TweetID's can. To that end, the first document in this database: 'TweetIDs.txt' contains the ID's of all 1300 tweets, and the dates they were authored on. Researchers can use these IDs in Twitter API to rehydrate them into the original tweets. Every tweet in the sample was publicly available (i.e not made private) at the time this research was undertaken, and this dataset compiled. It is possible that they may become private before another researcher can make use of these ID's. In such a case, I urge researchers to not use them, and restrict themselves to the still-public aspects of the dataset. In the original thesis itself, whilst the use of tweets accompanied by criticism and educational commentary constitutes fair dealing, I still went to great lengths to conceal the identities of the authors of each tweet. It is impossible to anonymise a tweet in a way that A) cannot be retraced back to its author, and B) retains enough information to prove its authenticity. A balance must be struck, therefore, in which the tweets must be rehydrated to use, and then have their identifying information scrubbed back out by users of the dataset, when writing about them. My preferred method was to screencap a tweet, then crop out all directly identifying information (i.e the account name and profile picture), then screencap the cropped image and use the second screencap in the thesis, accompanied by commentary. Thereby imaging software cannot be used on the tweet to undo the crop and de-anonymise it - because the identifying information was never in the 2nd screencap to begin with. As I note in the thesis, there is no single standard of acceptable anonymisation in Twitter/X research, different publications pass peer review using a variety of methods, some without anonymising their sources at all. I felt it appropriate to pursue full anonymisation as far as possible - just because tweets can be de-anonymised by dedicated work doesn't mean that we shouldn't make readers and researchers go to the effort. ----- The thesis was mixed-methods and made use of basic bivariate statistics to supplement its findings. As a result, the ways each tweet was coded are contained in the dataset's second document; the Excel Spreadsheet 'LCRTwitterQuantData.xlsx'. Each line of the spreadsheet corresponds to the Tweet of that number, and each column represents a variable. Most variables are represented in bivariate, nominal form; either 'Present' (1) or 'Not Present' (0), other than some of the dependent variables, which are explained further herein. The thesis revolved around several key dependent variables: representing 'Populism' and 'Right-Wing Authoritarianism', when compared to certain independent variables. I explain each, in turn, below. Readers should cross-reference each line of data with the corresponding Tweet ID, to see the language that inspired each coding for themselves. ---- Independent Variables. The thesis made use of two independent variables, to measure the effects of changing rhetoric over time. These were as follows. 1) Presidency: A categorical variable coded as 1 (Barack H. Obama) or 2 (Donald J. Trump), representing who was the sitting President when the tweet was authored. 2) Randomly Sampled Group: A categorical variable coded as 1 (Obama Sample), 2 (Trump Sample 1), 3 (Trump Sample 2), 4 (Trump Sample 3), 5 (Trump Sample 4), 6 (Trump Sample 5) or '#Null!' (Not in any sample). The second dependent variable aimed to compensate for the skew in the first. LCR got exponentially more popular on Twitter after Trump was elected, so tweets 193-1300 were all sent during the Trump Presidency. In order to demonstrate that variable changes were not caused by the skew, the random samples were created as a robustness check; 100 randomly sampled tweets from the Obama Presidency, compared with 5 different groups of 100 randomly sampled tweets from the Trump Presidency, with a visible change in a majority of the latter groups relative to the former, supplementing the 'Presidency' results. The year the tweets were each sent in was also recorded, from 1 (January 20th 2013-January 19th 2014) to 8 (January 20th 2020-January 19th 2021), but this formed no part of the thesis in the end. ---- Populism. Populism was represented by two variable categories 'Invocation of the People' (IVP) and 'Identification of an Enemy' (IoE). This broad coding allowed me to capture instances of traditional populist (i.e 'people vs elite') rhetoric, as well as other combative and discriminatory rhetoric, like Islamophobia. These have numerous variations, referring to the specific 'peoples' and 'enemies' invoked in a given tweet, as well as compound variables representing all such invocations, or whole categories of them. For example, the variable 'IVP - Republican Party' refers to tweets in which LCR sought to mobilise their fellow Republicans. A few IVP and IoE variables have component parts. In almost all cases, it is self-evident how these variables are put together. I detail the exceptions further below. Also included is the variable 'Retrotopia' (RET), signifying right-wing nostalgic language for an imagined past. This was not ultimately operationalised in the final thesis, but researchers may wish to follow up on it. ---- Right-Wing Authoritarianism. Right-Wing Authoritarianism (RWA) was represented by four variables: 'Calls to Violence', 'Veneration of the Leader', 'Dismissal of Accountability' and 'Media Scepticism'. These represent, respectively; tweets calling for physical retribution against enemies; tweets putting a given figure from Trump's inner circle on a pedestal; tweets expressing scepticism of, or seeking to undermine, democratic methods of accountability; and tweets attacking the media. In addition to this, 'Veneration of the Leader' has component variables noting the use of this rhetoric when referring to specific people: 'VenA - Nikki Haley', 'VenA - Melania Trump', 'VenA - Richard Grenell' and 'VenA -Donald Trump'. Furthermore, whilst ultimately not used in the thesis as part of 'Right-Wing Authoritarianism', 'Media Misrepresentation' was also recorded. This covered instances in which the tweet author had linked a source to their tweet, and misrepresented the contents in the tweet's text. Initially considered an RWA variable, it was ultimately deemed to be more associated with partisanship than authoritarianism, and its content relegated to an appendix. Nevertheless, it remains included here as it may be useful to further researchers. It had two component variables; 'MedMis - Total Misrepresentation', signifying times where the tweet was in no way reflective of the article it contained; and 'MedMis - Statements out of Context', in which the tweet selectively represented part of the source in order to make a point that was unreflective of the source as a whole. This commonly involved cherry-picking a quote that portrayed LCR favourably in sources that were, as a whole, more balanced and/or critical. ---- Potentially unclear codings. Most coding in this work was left deliberately opaque and self-evident, but the components of some composite variables may not be fully clear. I elaborate on them below. 1) 'IVP - LGBT+ Republicans' which is made up of 'IVP - Pro-LGBT+ Right', 'IVP - Double Standards' and 'IVP - Gay Republicans', representing the individual ways that LCR sought to specifically rhetorically construct that community. In this case as either part of an inclusive and welcoming party (Pro-LGBT+ Right), part of the same but evidenced by selectively forgiving Republican homophobia where they would not forgive a Democrat (Double-Standards), or by generalised statements about their group more generally, such as promoting a particular event. The goal was to capture the nuances in how LCR construct their own identity and rationalise their placement within the party, noting both their praise of its increasing inclusivity, as well as when they make, arguably, too much out of lip service. 2) 'IoE - Islam', which is made up of 'Islamic Homophobia', 'Islamic Extremism', 'Islamic-Majority Countries' (which itself has component variables signifying several specific states), and 'Islam in General'. This seems self-evident, but 'Islamic Homophobia' is listed in the spreadsheet alongside the other component variables of 'IoE - Homophobia' and not 'IoE - Islam', so there is the potential for researchers to be confused. 3) 'Alt-Right', which was not ultimately used in the final thesis, but may be of interest to researchers. It contains 'Victimhood Narrative' (which combines several other self-evident variables about victimisation), 'Masculinism', 'Alt-Right Lexicon' (which contains individual variables for instances of terms like 'Cuck'), and 'Nazism/White Supremacism.' 4) 'IoE-Foreign Homophobia' and 'IoE - Islamic Homophobia' often overlap, but are distinct variables. 'Foreign-' signifying Homophobia rooted in states and cultures that are not of an Islamic-Majority. They often appear together, dependent on context, such as when LCR are remonstrating about Chechnya; the goal is to other, and mobilise against, the Islamic-majority region of Chechya, and the Russian state more broadly, at the same time.