AntiWare: An Automated Android Malware Detection Tool based on Machine Learning Approach and Official Market Metadata

Akhuseyinoglu N. B., Akhuseyinoglu K.

7th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (IEEE UEMCON), New-York, United States Of America, 20 - 22 October 2016 identifier

  • Publication Type: Conference Paper / Full Text
  • City: New-York
  • Country: United States Of America
  • Middle East Technical University Affiliated: Yes


The prevalence of mobile devices has increased rapidly in recent years. People store valuable data like personal and financial information on those devices. In addition, applying "bring your own device (BYOD)" policy in companies has become popular. Hence, mobile devices are also source of valuable and confidential company information. Accordingly, there is a growing need for malware detection methods and tools to protection mobile devices against attacks targeting them. In this study, an automated feature-based static analysis method is applied to detect malicious mobile applications on Android devices. By utilizing the metadata of applications on the official market and an online free malware scanner, the feasibility of a mobile malware detection model using free public sources and having quite acceptable accuracy rates is shown. As opposed to previous studies considering only the requested permissions as feature set, additional market metadata including but not limited to application category, download number, developer name, and average rating are included in the analysis as the feature set for training supervised classification algorithms. Based on an experimental evaluation of the majority voting of antivirus (AV) engines on the free online AV community, applications in the training set are labeled as malicious or benign. Naive Bayes classification algorithm is chosen as supervised learning algorithm for the detection task. In addition, as filter-based algorithms, Chi-Square, Information Gain and ReliefF feature selection methods are used for overcoming potential overfitting problems. Finally, a quick prototype for showing the feasibility of the detection model is demonstrated with sample case applications.