This Is AuburnElectronic Theses and Dissertations

Predictive Text Analytics and Text Classification Algorithms

Date

2016-05-23

Author

Yucel, Ahmet

Type of Degree

Dissertation

Department

Mathematics and Statistics

Abstract

In this dissertation, there are three research studies that are mainly based on text analysis. In the first study, a sentiment analysis is performed for extracting and identifying the general rating of the customer reviews for certain products. Classifying the sentiments of online reviews of products is important in that it provides the ability to extract critical information that can be used to improve the quality. Machine learning (ML) algorithms can be used effectively to analyze and therefore to automatically classify the reviews. The objective of this study is to develop a numerical composite variable from unstructured data for the estimation of the star rates of the customer reviews from different domains by employing popular tree-based ML algorithms by incorporating five-fold cross validation into the models. In the second study, a special text classification is used for extracting and identifying the subjective content of the customer reviews. Classifying people’s feedback on a special subject is vital for analysts to understand the public behavior. Especially for the organizations dealing with big bodies of data consisting of people’s reviews, understanding the reviews’ contents and classify them by the subjective information is very important. Although Information Technology modernized process of data gathering, state of art methods are required to handle the available big data. On the other hand, traditional methods are not capable of delivering profound insights on the unstructured based feedbacks. Therefore, institutions are seeking novel methods for text analysis. Text mining (TM) is a machine-learning approach for dealing with people’s reviews that can provide valuable insights about people’s feedback. This study proposes a creation of composite variables for the learning process and utilizes Multilayer Perceptron-based Artificial Neural Network. In the third study, a Turkish TM algorithm is developed for grading written exam papers automatically via TM techniques. Turkish grammar and natural language processing based algorithms are produced on the answer key prepared by the grader and then applied on the answer papers of the students. The main idea in this study is to build a TM tool in Turkish which is going to grade exam papers in Turkish.