Publication Date

12-2017

Date of Final Oral Examination (Defense)

10-13-2017

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

Francesca Spezzano, Ph.D.

Supervisory Committee Member

Edoardo Serra, Ph.D.

Supervisory Committee Member

Steven M. Cutchin, Ph.D.

Abstract

Link Prediction is the problem of inferring new relationships among nodes in a network that can occur in the near future. Classical approaches mainly consider neighborhood structure similarity when linking nodes. However, we may also want to take into account whether the two nodes we are going to link will benefit from that by having an active interaction over time. For instance, it is better to link two nodes � and � if we know that these two nodes will interact in the social network in the future, rather than suggesting �, who may never interact with �. Thus, the longer the interaction is estimated to last, i.e., persistent interactions, the higher the priority is for connecting the two nodes.

This current thesis focuses on the problem of predicting how long two nodes will interact in a network by identifying potential pairs of nodes (�, �)that are not connected, yet show some Indirect Interaction. “Indirect Interaction” means that there is a particular action involving both the nodes depending on the type of network. For example, in social networks such as Facebook, there are users that are not friends but interact with other user’s wall posts. On the Wikipedia hyperlink network, it happens when readers navigate from page � to page � through the search box (on the top right corner of page �), and there is no explicit link on page � to �. This research explores cases that involved multiple interactions between � and � during an observational time interval [�, �). Two supervised learning approaches are proposed for the problem. Given a set of network-based predictors, the basic approach consists of learning a binary classifier to predict whether or not an observed Indirect Interaction will last in the future. The second and more fine-grained approach consists of estimating how long the interaction will last by modeling the problem via Survival Analysis or as a Regression task. Once the duration is estimated, this information is leveraged for the Link Prediction task.

Experiments were performed on the longitudinal Facebook network and wall interactions dataset, and Wikipedia Clickstream dataset to test this approach of predicting the Duration of Interaction and Link Prediction. Based on the experiments conducted, this study’s results show that the fine-grained approach performs the best with an AUROC of 85.4% on Facebook and 77% on Wikipedia for Link Prediction. Moreover, this approach beats a Link Prediction model that does not consider the Duration of Interaction and is based only on network properties, and that performs with an AUROC of 0.80 and 0.68 on Facebook and Wikipedia, respectively.

DOI

https://doi.org/10.18122/B28M56

Share

COinS