Android Malware Detection through Machine Learning on Kernel Task Structure
Type of DegreePhD Dissertation
Computer Science and Software Engineering
MetadataShow full item record
The popularity of free Android applications has risen rapidly along with the advent of smart phones. This has led to malicious Android apps being involuntarily installed, which violate the user privacy or conduct attack. According to the survey of Android malware from Kaspersky Lab, the proportion of malicious attacks for Android software has increased by a factor of two. Therefore malware detection on Android platforms is a growing concern because of the undesirable similarity between malicious behavior and benign behavior, which can lead to slow detection, and allow compromises to persist for comparatively long periods of time in infected phones. Meanwhile a huge number of malware detection techniques have been proposed to address the serious issue and safeguard Android systems. In order to distinguish malicious apps from Android software, the traits of malware pplications must be tracked by the software agent or build-in programs. However, These researchers only utilize a short list of the Android process features without considering the completeness and consistence of the entire information. In this dissertation, we present a multiple dimensional, kernel feature-based framework and feature weight-based detection (WBD) designed to categorize and comprehend the characteristics of Android malware and benign apps. Furthermore, our software agent is orchestrated and implemented for the data collection and storage to scan thousands of benign and malicious apps automatically. We examine 112 kernel attributes of executing the task data structure in the Android system and evaluate the detection accuracy with a number of datasets of various dimensions. We observe that memory- and signal-related features contribute to more precise classi cation than schedule-related and other descriptors of task states listed in our paper. Particularly, memory-related features provide ne-grain classi cation policies for preserving higher classi cation precision than the signal-related and others. Furthermore, we study and evaluate 80 newly infected attributes of the Android kernel task structure, prioritizing the 70 features of most signi cance based on dimensional reduction to optimize the e ciency of high-dimensional classi cation. Our experiments demonstrate that, as compared to existing techniques with a short list of task structure features, our method can achieve 94%-98% accuracy and 2%-7% false positive rate, while detecting malware apps with reduced-dimensional features that adequately abbreviate online malware detections and advance oine malware inspections. To strength the online framework on a parallel computing platform, we propose a Sparkbased Android malware detection framework to precisely predict the malicious applications in parallel. Apache Spark, as a popular open-source platform for large-scale data, has been used to deal with iterative machine learning jobs because of its e cient parallel computation and in-memory abstraction. Moreover, malware detection on Android platforms requires to be implemented in a data-parallel computation platform in consideration of the rapid increase of data size of collected samples. We also scrutinize 112 kernel attributes of kernel structure (task struct) in the Android system and evaluate the detection precision for the whole datasets with di erent numbers of computing nodes on Apache Spark platform. Our experiments demonstrate that, our technique can achieve 95%-99% of the precision rate with a faster computing speed by a Decision Tree Classi er on average, the other three classi ers lead to a lower precision rate while detecting malware apps with the in-memory parallel-data. We have designed a Radial Basis Function (RBF) network-based malware detection technique for Android phones to improve the accuracy rate of classi cation and the training speed. The traditional neural network with the Error Back Propagation method cannot recognize the malicious intrusion through Android kernel feature selection. The RBF hidden centers can be dynamically selected by a heuristic approach and the large-scale datasets of 2550 Android apps are gathered by our automatic data sample collector. We implement the algorithms of the RBF network and the Error Back Propagation (EBP) network. Furthermore, compared to the traditional neural network, the EBP network which achieves 84% of the accuracy rate, the RBF network can achieve 94% of the accuracy rate with the half of training and evaluation time. Our experiments demonstrate the RBF network can be used as a better technique of the Android malware detection.