Language Use and Community on Social Media


Internet usage of language (e.g., on Facebook, Reddit, 4chan) has become an important means of communication, but research involving an analysis of large collections of internet-sourced text is limited. This project seeks to examine how the unique characteristics of internet language use create a register distinct from in-person conversation. This question will be examined through the use of a large collection of texts (currently 13,523 words) pulled from online discussions from sites with different rules, levels of anonymity, and communities. These texts will be analyzed using AntConc, a program designed for the analysis of large collections of text (corpora). Lists of the most frequently occurring words, lists of frequently co-occurring words, and other tools will be used to determine linguistic characteristics of this register. Although not complete, a preliminary analysis of the corpus shows the presence of vulgar words is in a positive correlation with the level of anonymity of the site. This project will be useful in characterizing a register of language that is widely used. It also may shed light on how anonymity and membership in online communities impacts language choice.

