Using Network Analysis and Natural Language Processing Methods to Study Dark Side of Social media with Applications in Fake news and Anti-vaccine Messages
Type of DegreePhD Dissertation
Industrial and Systems Engineering
MetadataShow full item record
Social media has revolutionized the way people consume information and communicate with each other. Despite the several benefits and opportunities of the advent of social media, such as crisis management and situational awareness, social media's dark side has affected people’s lives and society. One of the primary concerns regarding social media is shaping the public’s perception and opinion toward specific topics. Two examples of social media's dark side are the dissemination of fake news and anti-vaccine messages. The severe impacts of fake news and anti-vaccine messages can be enormous in situations such as the COVID-19 pandemic, where people’s behavior plays a significant role in managing this pandemic. In this dissertation, we study fake news and anti-vaccine message. The first goal of this study is to design a recommendation system that detects and filters out fake and satirical news and recommends only real news. To develop a real news recommendation system, first, we built a fake and satirical news detection model by training a random forest on the distribution of topics in each article. The distribution of topics has been extracted using topics modeling techniques: latent Dirichlet allocation and latent semantic analysis. The model achieves an accuracy of 85% in recognizing fake and satirical news from real. We also used a lexicon-based sentiment analysis model to extract the sentiment of articles. Second, a K-nearest neighbor finds the K similar real news based on the distribution of topics and sentiment similarity of users’ latest read news. The second goal of this dissertation is to narrow the fake news studies, where we focus on fake news in the context of COVID-19. We aim to check and study whether deception theories can help reveal the strategies used by fake news writers in COVID-19 to deceive the audience. We used natural language processing techniques and partial least squares structural equation modeling analysis to measure and test the strategies. Further to evaluate the results, we built a detection model by applying XGBoost to the discovered deceiving strategies. The results suggest interesting findings. For example, We found that fake news writers in the context of COVID-19 use significantly more uncertain language, more negative affect, less diversity, more expressive words, and more cognitive process in their writing. The final goal of this dissertation is to study anti-vaccine posts on Facebook. We first seek to check if there are heterogenous topical groups of posts related to the COVID-19 vaccine on Facebook. Then we intend to study and contrast the anti-vaccine group with other discovered groups in terms of emotion and network characteristics. We implement a semantic network based on semantic similarities between the posts. To find semantic similarity, we integrate a BERT model with the cosine-similarity method. We found five giant topical groups and named them based on the major topics in each group. The results of emotion analysis show higher emotional posts and more negative emotions in anti-vaccine groups. Also, the network characteristics of groups indicate political and anti-vaccine are more topical homophily and target a specific audience.