The Effects of Genetic-based and Swarm Intelligence-based Feature Selection on Adversarial Author Identification

Halladay, Steve

Metadata Field	Value	Language
dc.contributor.advisor	Dozier, Gerry
dc.contributor.author	Halladay, Steve
dc.date.accessioned	2022-04-25T12:28:34Z
dc.date.available	2022-04-25T12:28:34Z
dc.date.issued	2022-04-25
dc.identifier.uri	https://etd.auburn.edu//handle/10415/8138
dc.description.abstract	Within the realm of author identification, where researchers work to classify writing samples by author, researchers are using more and diverse feature sets to try to improve classification accuracy. From a computational cost perspective, these additional feature sets become problematic. Further, adding more feature sets may inadvertently decrease classification accuracy. Therefore, selecting the appropriate subset of features is an important challenge for researchers. However, the feature subset selection concern becomes even more challenging due to a couple of complexities. The first complexity is that different datasets require different feature sets for good identification performance. A feature set that performs well with one dataset may not perform well with another. So, it is important to customize the feature set to the characteristics of the dataset. The second complexity is that it appears that feature selection makes author identification systems more susceptible to adversarial attacks. These attacks occur when authors attempt to obfuscate their writing style or impersonate another author’s writing style. The focus of the research in this work is in this second area of complexity, namely, understanding the susceptibility of adversarial attacks on author identification systems due to feature selection. Specifically, this research investigates the susceptibility of adversarial attacks on author identification systems that use genetic-based and swarm intelligence-based feature selection. The intent of this research is to observe and characterize the factors affecting adversarial susceptibility by considering several parameters, including dataset content, dataset size and feature selection algorithm. This work employs two datasets: the CASIS dataset, which is a collection of blog posts, and the PAN19 dataset, which is a collection of extracts from Twitter feeds and includes bot- generated writing samples. We vary the dataset sizes to ascertain the effects of a larger author pool. We also vary the bias towards minimizing the feature set. Then, we analyze the data to determine those factors that correlate with successful adversarial attacks on author identification systems both with and without feature selection.	en_US
dc.subject	Computer Science and Software Engineering	en_US
dc.title	The Effects of Genetic-based and Swarm Intelligence-based Feature Selection on Adversarial Author Identification	en_US
dc.type	PhD Dissertation	en_US
dc.embargo.status	NOT_EMBARGOED	en_US
dc.embargo.enddate	2022-04-25	en_US
dc.contributor.committee	Seals, Cheryl
dc.contributor.committee	Umphress, David
dc.contributor.committee	Thomas, Jakita

Files in this item

Name:: diss.v11.pdf
Size:: 21.81Mb

Show simple item record