The need for a systematic machine-learning process: A proposal via a mobile malware classification case study


Canbek G.

14th International Conference on Information Security and Cryptology, ISCTURKEY 2021, Ankara, Türkiye, 2 - 03 Aralık 2021, ss.173-178 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/iscturkey53027.2021.9654378
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.173-178
  • Anahtar Kelimeler: classification, data quality, machine learning, malware analysis, malware detection, systematic process
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

© 2021 IEEE.Machine learning (ML) seems a highly promising solution for many problems in many domains including healthcare and cyber security. Researchers and practitioners try to make use of ML with high expectations of a return of investment in terms of not only money but also effort and time. Those expectations might become similar to 'if your only tool is a hammer, then every problem looks like nails' mood. Conducting anML workflow efficiently and correctly is difficult to achieve in reality considering both ML challenges and domain-specific issues. Hence, the interaction and dependencies between ML and domain should be clearly addressed and the steps should be planned and conducted according to certain requirements. This study provides insights into achieving such goals through a systematic ML process that should be conducted from beginning to end. The systematic process is designed as a cycle with eight sub-processes going through introduced spaces (file, sample, class, feature, dataset, model, and finally metric spaces). The dataset quality analysis/comparison sub-process is specifically formed as a quality control gateway. The proposed process is explained via a case study of the Android mobile malware classification problem domain where practical and research problems, as well as possible solutions, are provided.