Publication Date

5-2016

Date of Final Oral Examination (Defense)

12-17-2015

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Major Advisor

Marı́a Soledad Pera, Ph.D.

Advisor

Timothy Andersen, Ph.D.

Advisor

Edoardo Serra, Ph.D.

Abstract

Online social networks are an increasingly central medium of communication in the 21st century. We have seen a proliferation of competing social networks which differentiate themselves by serving different niches of communication. Among these, Twitter has risen to prominence as a leader among microblogging communities, characterized by publicly visible 140-character messages called tweets. The wide visibility of Twitter messages has enabled some users to curate large followings, and has facilitated content creators who wish to reach as many viewers as possible. Researchers have since investigated many methods for predicting which messages will become popular or even go viral on Twitter. Although there are many facets to this problem, and various methods of approaching it have been proposed, we note that anyone who wants to create a popular Twitter account will sooner or later have to produce popular content. In this study we investigate the content-based approach of predicting popular tweets based only on the text they contain. Particularly, we ask whether topic models can be used to identify topics of discussion which are more likely to be associated with popular tweets. In the process, we explore methods for collecting and processing a large-scale corpus of Twitter content. Our experiments found that while topic-based prediction methods could lead to effective popularity prediction, they were outperformed by other, simpler content-based methods.

Share

COinS