Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection

Altay, Betul; Dokeroglu, Tansel; COŞAR, AHMET

doi:10.1007/s00500-018-3066-4

Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection

Altay B., Dokeroglu T., COŞAR A.

SOFT COMPUTING, cilt.23, sa.12, ss.4177-4191, 2019 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 23 Sayı: 12
Basım Tarihi: 2019
Doi Numarası: 10.1007/s00500-018-3066-4
Dergi Adı: SOFT COMPUTING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.4177-4191
Anahtar Kelimeler: Malicious, Webpage, Classification, SVM, Maximum entropy, Extreme learning machines, Keyword density
Orta Doğu Teknik Üniversitesi Adresli: Hayır

Özet

Conventional malicious webpage detection methods use blacklists in order to decide whether a webpage is malicious or not. The blacklists are generally maintained by third-party organizations. However, keeping a list of all malicious Web sites and updating this list regularly is not an easy task for the frequently changing and rapidly growing number of webpages on the web. In this study, we propose a novel context-sensitive and keyword density-based method for the classification of webpages by using three supervised machine learning techniques, support vector machine, maximum entropy, and extreme learning machine. Features (words) of webpages are obtained from HTML contents and information is extracted by using feature extraction methods: existence of words, keyword frequencies, and keyword density techniques. The performance of proposed machine learning models is evaluated by using a benchmark data set which consists of one hundred thousand webpages. Experimental results show that the proposed method can detect malicious webpages with an accuracy of 98.24%, which is a significant improvement compared to state-of-the-art approaches.