Predicting Retweets of Fake and Real News

Additional Funding Sources

This research has been sponsored by the National Science Foundation under Award No. 1950599.

Abstract

The spread of misinformation across social media is one of the biggest national security threats in the 21st century. Previous research has been successful at identifying misinformation spreaders on Twitter based on user demographics and past tweet history, and others have been relatively successful at predicting the number of retweets of a given tweet. However, the problem of predicting the number of retweets of news articles tweeted by a specific user has not yet been tackled, which determines the impact of the initial tweet containing misinformation. We use data from FakeNewsNet, containing a list of 43119 known fake news spreaders and 135,234 real news spreaders, and the past 500 tweets of each user to build profiles of each user to predict the number of retweets the news article tweet will receive. We present a random forest classifier that categorizes the number of retweets a news tweet will receive into 5 ranges using user profile characteristics, emotion and sentiment analysis of tweets, and information about past tweets. This classifier results in a 0.93 and 0.85 weighted F1 score for real news and fake news retweet prediction respectively, performing better than other retweet prediction models which resulted in a 0.80 F1 score for real news and 0.82 F1 score for fake news at best. By comparing the different features pertinent to this random forest classifier between the two datasets, we find that the most distinguishing features differing real and fake news retweet prediction are average number of replies to the user's past tweets, average url count in the user's past tweets, account age, and following count. We also show that fake news retweets are more variable in terms of user characteristics.

This document is currently not available here.

Share

COinS
 

Predicting Retweets of Fake and Real News

The spread of misinformation across social media is one of the biggest national security threats in the 21st century. Previous research has been successful at identifying misinformation spreaders on Twitter based on user demographics and past tweet history, and others have been relatively successful at predicting the number of retweets of a given tweet. However, the problem of predicting the number of retweets of news articles tweeted by a specific user has not yet been tackled, which determines the impact of the initial tweet containing misinformation. We use data from FakeNewsNet, containing a list of 43119 known fake news spreaders and 135,234 real news spreaders, and the past 500 tweets of each user to build profiles of each user to predict the number of retweets the news article tweet will receive. We present a random forest classifier that categorizes the number of retweets a news tweet will receive into 5 ranges using user profile characteristics, emotion and sentiment analysis of tweets, and information about past tweets. This classifier results in a 0.93 and 0.85 weighted F1 score for real news and fake news retweet prediction respectively, performing better than other retweet prediction models which resulted in a 0.80 F1 score for real news and 0.82 F1 score for fake news at best. By comparing the different features pertinent to this random forest classifier between the two datasets, we find that the most distinguishing features differing real and fake news retweet prediction are average number of replies to the user's past tweets, average url count in the user's past tweets, account age, and following count. We also show that fake news retweets are more variable in terms of user characteristics.