This Is AuburnElectronic Theses and Dissertations

Show simple item record

Semi-Supervised Classification Techniques in Big Data Text Analytics


Metadata FieldValueLanguage
dc.contributor.advisorCarpenter, Mark
dc.contributor.authorLi, Geng
dc.date.accessioned2013-04-22T19:45:31Z
dc.date.available2013-04-22T19:45:31Z
dc.date.issued2013-04-22
dc.identifier.urihttp://hdl.handle.net/10415/3580
dc.description.abstractImagine that you are trying to read everything you got in 2011 as soon as possible. That will take the first three months of 2012. How can we get access to the vast unstructured literature, automatically process it, effectively predigest and make sense of it with less effort? Focusing on the entire preprocessing and classification steps, a hybrid semi-supervised text classification approach proposed in this dissertation will help you survive in a rising sea of information. The porter stemming, new adaptive TFIDF-LDA weighting, Zipf's law based dimension reduction, multinomial Naive Bayes classifier, and Expectation-maximization algorithm are harmoniously integrated together in the hybrid semi-supervised text classification model. From a small set of “known” labeled papers, you can use this mixture model to make predictions about newly “unknown” unclassified papers into the predefined categories. Extensive experimental results show that the proposed system dramatically reduces the feature dimension and improves the classification accuracy.en_US
dc.rightsEMBARGO_NOT_AUBURNen_US
dc.subjectMathematics and Statisticsen_US
dc.titleSemi-Supervised Classification Techniques in Big Data Text Analyticsen_US
dc.typedissertationen_US
dc.embargo.lengthMONTHS_WITHHELD:60en_US
dc.embargo.statusEMBARGOEDen_US
dc.embargo.enddate2018-04-22en_US

Files in this item

Show simple item record