A Study of Adversarial Attacks on Machine Learning-based Fake News Detection Systems

Brown, Brandon

View/Open

Auburn_University_Dissertation_v4__Submitted.pdf (1.757Mb)

Date

2023-05-04

Author

Brown, Brandon

Type of Degree

PhD Dissertation

Department

Computer Science and Software Engineering

Restriction Status

EMBARGOED

Restriction Type

Full

Date Available

05-04-2028

Metadata

Show full item record

Abstract

Due to the increased use and reliance on social media, fake news has become a significant problem that can cause great harm to individuals. Because of the dangerousness of fake news, techniques must be developed to detect and keep it from spreading. Currently, fact-checking is a damage control strategy that is essential to detecting and mitigating fake news. Websites such as Politifact, Snopes, and Factcheck.org use human verifiers to manually fact-check news articles. In cases where there are only a few articles to be fact-checked by human verifiers, relying on these websites to take the time to effectively research and debunk fake news is sufficient. However, in the case of social media, where news is generated at an extremely high volume (and with an extremely high velocity), automated approaches are needed for fake news detection. Recently, social media companies have begun to rely on automated systems in the form of machine learning-based fake news detection systems (ML-FNDSs). These ML-FNDSs are used to classify articles as either news or fake news. Although these ML-FNDSs effectively classify articles as either news or fake news, they are susceptible to adversarial attacks. Adversaries use these attacks to make a ML-FNDS misclassify its input. Adversarial attacks can be used to make a ML-FNDS accept fake news as news or flag news as being fake. In our research, we study two potential vulnerabilities of ML-FNDSs with respect to false positives (e.g., when news is erroneously classified as fake news) and false negatives (e.g., when fake news is erroneously classified as news). In this dissertation, we first introduce the concept of an Adversarial Universal False Positive (UFP) Attack and the Adversarial Universal False Negative (UFN) Attack. Next, we study the effectiveness of these two attacks on ML-FNDSs based on a single classifier, and finally, we study these attacks on ML-FNDSs based on a set of classifiers (ensemble machines).

URI

https://etd.auburn.edu//handle/10415/8718