Internet Data Acquisition, Search and Processing
Date
2009-12-11Type of Degree
thesisDepartment
Electrical Engineering
Metadata
Show full item recordAbstract
Internet data acquisition from the Web is the process of extracting essential data from any web server. Semi-structured data present in the form of HTML web pages need to be extracted, converted into structured data before presenting them to the users. In this thesis, four tools are presented which perform the functions - data acquisition, data search and data processing. They are: GradeWatch, Ethernet Robot, Online Search Tool and Citations Explorer Tool. GradeWatch is a tool mainly for students and faculty of an academic institution to check and post grades online respectively. Ethernet Robot extracts paper details for IEEE Transactions on Industrial Electronics from IEEE web server using Perl scripting language and processes the data using regular expressions. \\ Using the paper database created by Ethernet Robot, an Online Search Tool is developed which can perform a search up to a depth of three keywords and present the information on a separate web page from which the users can also download the papers by clicking the corresponding links provided with them. The Citations Explorer is a program which returns the most cited papers for the IEEE Transactions on Industrial Electronics for a particular year. The program uses Google Scholar to do the search and Perl regular expressions to process the data. The procedure for designing all these tools involves fetching, filtering, processing and presentation of required data. The resultant HTML files consisting of the required data are displayed for the perusal of users. Future enhancements to our Ethernet Robot include optimization to improve performance and customization for use as a sophisticated client-specific search agent.