Publication Date

8-2021

Date of Final Oral Examination (Defense)

6-9-2021

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

Francesca Spezzano, Ph.D.

Supervisory Committee Member

Jerry Alan Fails, Ph.D.

Supervisory Committee Member

Edoardo Serra, Ph.D.

Abstract

Online media is changing the traditional news industry and diminishing the role of journalists, newspapers, and even news channels. This in turn is enhancing the ability of fake news to influence public opinion on important topics. The threat of fake news is quite imminent, as it allows malicious users to share their agenda with a larger audience. Major social media platforms like Twitter, Facebook, etc., are making it easy to spread fake news due to the minimal moderation/ fact-checking on these platforms.

This work aims at predicting fake and real news sharing in social media. Specifically, we employ a multi-level influence, drawn from the Diffusion of Innovation (DOI) theory on a real-world dataset and predict whether and when a given user will share information in social media. We hypothesize that fake and real news sharing is better predicted by considering user, news, and network-level feature attributes together.

We are also predicting the time elapsed between the influencer and follower shares via survival analysis. Binary classifiers such as Support Vector Machine (SVM), Random Forest, etc. are used for the prediction of fake and real news sharing. This approach is demonstrated using a dataset comprising 1,572 users that are sampled from the FakeNewsNet repository. Our results show a 30% increase in the Area Under Receiver Operation Characteristics (AUROC) in comparison to the best baseline. Real and fake news sharing shows high dependency on user similarity, tie strength, and explicit features.

Furthermore, the analysis shows that users with characteristic features like love, self-transcendence, ideals, conservation, and openness to change tend to share real news, whereas users with dominant features like self-enhancement, curiosity, closeness, structure, and harmony are more likely to share fake news.

Finally, survival analysis is employed to predict the time elapsed between influencer and follower shares. The Concordance Index (C-Index) for real news sharing is slightly lower compared to the baseline, and the C-Index of Random Survival Forest (RSF) is comparable to the baseline for fake news sharing. Furthermore, in comparison to the regression baseline models, the Mean Absolute Error (MAE) is significantly less in RSF for both real and fake news sharing.

DOI

https://doi.org/10.18122/td.1852.boisestate

Share

COinS