This Is AuburnElectronic Theses and Dissertations

Data Extraction from Servers by the Internet Robot

Date

2009-07-15

Author

Pham, Nam

Type of Degree

thesis

Department

Electrical Engineering

Abstract

Data extraction from internet is a way to download and extract the required data automatically from web servers. In this thesis, we present a method called the Internet Robot to extract the data directly from a web server by using Perl scripting language with the powerful regular expressions. The regular expressions are widely used in this method to reduce the complexity of the program code as well as increase up the downloading and extracting speed. The Internet Robot in this thesis is a process of three steps: data collection, data filtering and processing and data presentation. The final result of this process will be the html files- with all required data in the typical format that is presented under different links of a webpage. The accuracy and speed make this method become unique in processing and extracting data not only from the internet but also from an available database.