This Is AuburnElectronic Theses and Dissertations

Using Genetic Programming to Quantify the Effectiveness of Similar User Cluster History as a Personalized Search Metric




Eoff, Brian

Type of Degree



Computer Science and Software Engineering


Online search is the service that pushes the Internet. One must only look at the success of a company such as Google, an idea from a 1998 graduate research paper that has in 2005 not only become a wildly successful company, but whose very name Google had become synonymous with the verb search, to realize how important search is. Many IR researchers have suggested that the next great step in search is to make the process more personal. Search results should be tailored to the individual user. Early attempts at personalization such as relevance feedback have never gained popularity with users due to the need for further interaction. The end goal is personalization without the user having to contribute more of their attention. I propose that personalization can be accomplished by observing a user's document selections. That a page's overall popularity is important, but more important is the pages that users similar to the primary user find popular. I also propose that history should not be based solely on a listing of prior documents a user has found relevant, but on the clusters of documents a user has found relevant. Clusters allow for pockets of information to be observed, and thus a fuller understanding of the user can be determined. How then do I determine if this new metric is usable short of implementing a search engine using the metric, putting it online, and hoping users flock to it? Genetic programming was created to solve such problems. Genetic programming can be used to determine if a newly proposed information retrieval metric (collaborative filtering based on cluster history) is effective. By giving a genetic programming framework a training set containing documents, queries, and relevance judgments an optimal ranking function can be found. The genetic programming framework could incorporate the new metrics proposed along with traditional search metrics such as term frequency and document length. If these proposed metrics survived the evolution process they can be determined to be effective in the returning of relevant documents to a user's query.