Publication Date


Date of Final Oral Examination (Defense)


Type of Culminating Activity


Degree Title

Master of Science in Computer Science


Computer Science

Major Advisor

Marı́a Soledad Pera, Ph.D.


Timothy Andersen, Ph.D.


Edoardo Serra, Ph.D.


Online social networks are an increasingly central medium of communication in the 21st century. We have seen a proliferation of competing social networks which differentiate themselves by serving different niches of communication. Among these, Twitter has risen to prominence as a leader among microblogging communities, characterized by publicly visible 140-character messages called tweets. The wide visibility of Twitter messages has enabled some users to curate large followings, and has facilitated content creators who wish to reach as many viewers as possible. Researchers have since investigated many methods for predicting which messages will become popular or even go viral on Twitter. Although there are many facets to this problem, and various methods of approaching it have been proposed, we note that anyone who wants to create a popular Twitter account will sooner or later have to produce popular content. In this study we investigate the content-based approach of predicting popular tweets based only on the text they contain. Particularly, we ask whether topic models can be used to identify topics of discussion which are more likely to be associated with popular tweets. In the process, we explore methods for collecting and processing a large-scale corpus of Twitter content. Our experiments found that while topic-based prediction methods could lead to effective popularity prediction, they were outperformed by other, simpler content-based methods.