Semi-Supervised Classification Techniques in Big Data Text Analytics

Li, Geng

Metadata Field	Value	Language
dc.contributor.advisor	Carpenter, Mark
dc.contributor.author	Li, Geng
dc.date.accessioned	2013-04-22T19:45:31Z
dc.date.available	2013-04-22T19:45:31Z
dc.date.issued	2013-04-22
dc.identifier.uri	http://hdl.handle.net/10415/3580
dc.description.abstract	Imagine that you are trying to read everything you got in 2011 as soon as possible. That will take the first three months of 2012. How can we get access to the vast unstructured literature, automatically process it, effectively predigest and make sense of it with less effort? Focusing on the entire preprocessing and classification steps, a hybrid semi-supervised text classification approach proposed in this dissertation will help you survive in a rising sea of information. The porter stemming, new adaptive TFIDF-LDA weighting, Zipf's law based dimension reduction, multinomial Naive Bayes classifier, and Expectation-maximization algorithm are harmoniously integrated together in the hybrid semi-supervised text classification model. From a small set of “known” labeled papers, you can use this mixture model to make predictions about newly “unknown” unclassified papers into the predefined categories. Extensive experimental results show that the proposed system dramatically reduces the feature dimension and improves the classification accuracy.	en_US
dc.rights	EMBARGO_NOT_AUBURN	en_US
dc.subject	Mathematics and Statistics	en_US
dc.title	Semi-Supervised Classification Techniques in Big Data Text Analytics	en_US
dc.type	dissertation	en_US
dc.embargo.length	MONTHS_WITHHELD:60	en_US
dc.embargo.status	EMBARGOED	en_US
dc.embargo.enddate	2018-04-22	en_US

Files in this item

Name:: Geng Li-dissertation.pdf
Size:: 848.1Kb

Show simple item record