The need for a systematic machine-learning process: A proposal via a mobile malware classification case study

14th International Conference on Information Security and Cryptology, ISCTURKEY 2021, Ankara, Türkiye, 2 - 03 Aralık 2021, ss.173-178

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/iscturkey53027.2021.9654378
Basıldığı Şehir: Ankara
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.173-178
Anahtar Kelimeler: classification, data quality, machine learning, malware analysis, malware detection, systematic process
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

© 2021 IEEE.Machine learning (ML) seems a highly promising solution for many problems in many domains including healthcare and cyber security. Researchers and practitioners try to make use of ML with high expectations of a return of investment in terms of not only money but also effort and time. Those expectations might become similar to 'if your only tool is a hammer, then every problem looks like nails' mood. Conducting anML workflow efficiently and correctly is difficult to achieve in reality considering both ML challenges and domain-specific issues. Hence, the interaction and dependencies between ML and domain should be clearly addressed and the steps should be planned and conducted according to certain requirements. This study provides insights into achieving such goals through a systematic ML process that should be conducted from beginning to end. The systematic process is designed as a cycle with eight sub-processes going through introduced spaces (file, sample, class, feature, dataset, model, and finally metric spaces). The dataset quality analysis/comparison sub-process is specifically formed as a quality control gateway. The proposed process is explained via a case study of the Android mobile malware classification problem domain where practical and research problems, as well as possible solutions, are provided.