Adversarial Analysis of Fake News Detectors

Additional Funding Sources

This research has been sponsored by the National Science Foundation under Award No. 1950599.

Abstract

In recent years, fake news detection models have been developed to mitigate the problem of fake news. One example is dEFEND, a state-of-the-art natural language processing (NLP) model that utilizes both the news contents as well as comments on these to aid in detecting fake news. In order to prevent intentional misclassifications caused by malicious actors, we aim to show there is a vulnerability in the model, then, show how we can make the system more robust against that vulnerability.Attacks on fake news detection models are a growing concern and an active area of research. One product of this is MALCOM, a malicious comment generator that can force a fake or real classification of a news piece with a success rate upwards of 93%. MALCOM generates stylistically similar and topic-relevant comments to the input text, alleviating common problems with most attacks on NLP models (e.g., producing nonsensical examples). However, it is possible to detect that these comments are computer-generated. Our objective is to instead use real, user-generated comments sourced from the same dataset so that they are indistinguishable from the rest.Using the FakeNewsNet dataset, we develop an attack by grouping articles and their preexisting comments into topics, and then computing their similarity, or "distance" from each other. With this metric, we identify generic as well as topic-specific comments that can be used to sway dEFEND’s classification of an article. An ongoing area of exploration is creating a defense to mitigate these attacks, for example through the filtering of comments based on properties we have identified as being adversarial.

This document is currently not available here.

Share

COinS
 

Adversarial Analysis of Fake News Detectors

In recent years, fake news detection models have been developed to mitigate the problem of fake news. One example is dEFEND, a state-of-the-art natural language processing (NLP) model that utilizes both the news contents as well as comments on these to aid in detecting fake news. In order to prevent intentional misclassifications caused by malicious actors, we aim to show there is a vulnerability in the model, then, show how we can make the system more robust against that vulnerability.Attacks on fake news detection models are a growing concern and an active area of research. One product of this is MALCOM, a malicious comment generator that can force a fake or real classification of a news piece with a success rate upwards of 93%. MALCOM generates stylistically similar and topic-relevant comments to the input text, alleviating common problems with most attacks on NLP models (e.g., producing nonsensical examples). However, it is possible to detect that these comments are computer-generated. Our objective is to instead use real, user-generated comments sourced from the same dataset so that they are indistinguishable from the rest.Using the FakeNewsNet dataset, we develop an attack by grouping articles and their preexisting comments into topics, and then computing their similarity, or "distance" from each other. With this metric, we identify generic as well as topic-specific comments that can be used to sway dEFEND’s classification of an article. An ongoing area of exploration is creating a defense to mitigate these attacks, for example through the filtering of comments based on properties we have identified as being adversarial.