The need for a systematic machine-learning process: A proposal via a mobile malware classification case study

Canbek G.

14th International Conference on Information Security and Cryptology, ISCTURKEY 2021, Ankara, Turkey, 2 - 03 December 2021, pp.173-178 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/iscturkey53027.2021.9654378
  • City: Ankara
  • Country: Turkey
  • Page Numbers: pp.173-178
  • Keywords: classification, data quality, machine learning, malware analysis, malware detection, systematic process
  • Middle East Technical University Affiliated: Yes


© 2021 IEEE.Machine learning (ML) seems a highly promising solution for many problems in many domains including healthcare and cyber security. Researchers and practitioners try to make use of ML with high expectations of a return of investment in terms of not only money but also effort and time. Those expectations might become similar to 'if your only tool is a hammer, then every problem looks like nails' mood. Conducting anML workflow efficiently and correctly is difficult to achieve in reality considering both ML challenges and domain-specific issues. Hence, the interaction and dependencies between ML and domain should be clearly addressed and the steps should be planned and conducted according to certain requirements. This study provides insights into achieving such goals through a systematic ML process that should be conducted from beginning to end. The systematic process is designed as a cycle with eight sub-processes going through introduced spaces (file, sample, class, feature, dataset, model, and finally metric spaces). The dataset quality analysis/comparison sub-process is specifically formed as a quality control gateway. The proposed process is explained via a case study of the Android mobile malware classification problem domain where practical and research problems, as well as possible solutions, are provided.