Modeling Sharing Time of Fake and Real News
Additional Funding Sources
This research has been sponsored by the National Science Foundation under Award No. 1950599.
Abstract
Viral spread of misinformation on social media is one of the largest threats to national security, and accurately predicting what impacts sharing time can help combat viral spread. This work aims to predict the time it takes users to share misinformation on social media. Survival analysis is a statistical analysis method aimed at predicting the time-to-event and is often used to predict survival time for patients from the moment of a disease diagnosis. It differs from other statistical methods in that it also considers the data where the event (death/sharing) never occurs (censored data). This work applies survival analysis to time to sharing of information and misinformation on social media, using a dataset gathered by Joy et al., 2021. The dataset contains the user-based, news-based, and network-based information related to the news item exposure, as well as the time taken for a user to share the news in cases where they shared it. We compare several survival analysis methods for predicting the time until a user shares with several baseline regression models using various performance metrics. We use survival time estimates from the best performing model to compare survival time for fake and real news, and finally compute the weighted importance of each covariate on the survival time for a user.
Modeling Sharing Time of Fake and Real News
Viral spread of misinformation on social media is one of the largest threats to national security, and accurately predicting what impacts sharing time can help combat viral spread. This work aims to predict the time it takes users to share misinformation on social media. Survival analysis is a statistical analysis method aimed at predicting the time-to-event and is often used to predict survival time for patients from the moment of a disease diagnosis. It differs from other statistical methods in that it also considers the data where the event (death/sharing) never occurs (censored data). This work applies survival analysis to time to sharing of information and misinformation on social media, using a dataset gathered by Joy et al., 2021. The dataset contains the user-based, news-based, and network-based information related to the news item exposure, as well as the time taken for a user to share the news in cases where they shared it. We compare several survival analysis methods for predicting the time until a user shares with several baseline regression models using various performance metrics. We use survival time estimates from the best performing model to compare survival time for fake and real news, and finally compute the weighted importance of each covariate on the survival time for a user.