Published July 18, 2019 | Version v1
Other Open

Collection 39: Reddit on the Humanities

Description

Collection 39 (C-39) is a subset of WE1S's C-38 Reddit collection tailored to focus on student discourse about the humanities. Where C-38 includes Reddit comments longer than 225 words from 2006 to 2019 containing the terms humanities, liberal arts, or the arts, C-39 consists of 66,290 comments from that larger collection (about half the original number) that also contain at least one of the terms student, major, or college (including plurals and other forms). (Similar to C-38 is WE1S's Corpus-A, an earlier version of the same collection, but including only the years 2006-2018.) (See WE1S Research Materials Overview for the relation between the project's "datasets" and "collections.")

Explanation

WE1S's rationale and methodology for collecting Reddit posts to study public discourse about the humanities (especially by students) is explained in the blog post by WE1S's lead Reddit researcher Raymond Steding: "A Digital Humanities Study of Reddit Student Discourse about the Humanities" (2019).

As Steding notes, he initially collected 3.3 terabytes of Reddit data from 2006 to 2018 (approximately five billion comments) by downloading it from pushshift.io in JSON format. Data for 2019 was later added.

This data was filtered to retain only comments containing at least one of the terms humanities, liberal arts, or the arts. Then to improve the coherence of topic models of the collection and to make the number of documents more tractable, WE1S subtracted comments under 225 words. WE1S did not weed out duplicates in this collection; nor filter posts and comments based on Reddit user "karma" scores.

Collection Metadata

  • Created by: Raymond Steding
  • Created on: July 18th 2019, 12:00:00 am
  • WE1S Collection Registry ID: 20200714_2157_reddit-students-and-the-humanities
  • Data sources: Reddit.

Suggested Citation for Collection

WhatEvery1Says (WE1S) Project. (2019, July 06). Collection 39: Reddit on the Humanities. doi: http://10.5281/zenodo.4959834.

Notes

(See WE1S Research Materials Overview for the relation between the project's "datasets" and "collections.")

Files

Files (458.7 MB)

Name Size Download all
md5:cb03796b59a40cf970661e4b188f8d57
458.7 MB Download

Additional details

Related works

Is derived from
10.5281/zenodo.5044260 (DOI)