Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection


Altay B., Dokeroglu T., COŞAR A.

SOFT COMPUTING, cilt.23, sa.12, ss.4177-4191, 2019 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 23 Sayı: 12
  • Basım Tarihi: 2019
  • Doi Numarası: 10.1007/s00500-018-3066-4
  • Dergi Adı: SOFT COMPUTING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.4177-4191
  • Anahtar Kelimeler: Malicious, Webpage, Classification, SVM, Maximum entropy, Extreme learning machines, Keyword density
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Conventional malicious webpage detection methods use blacklists in order to decide whether a webpage is malicious or not. The blacklists are generally maintained by third-party organizations. However, keeping a list of all malicious Web sites and updating this list regularly is not an easy task for the frequently changing and rapidly growing number of webpages on the web. In this study, we propose a novel context-sensitive and keyword density-based method for the classification of webpages by using three supervised machine learning techniques, support vector machine, maximum entropy, and extreme learning machine. Features (words) of webpages are obtained from HTML contents and information is extracted by using feature extraction methods: existence of words, keyword frequencies, and keyword density techniques. The performance of proposed machine learning models is evaluated by using a benchmark data set which consists of one hundred thousand webpages. Experimental results show that the proposed method can detect malicious webpages with an accuracy of 98.24%, which is a significant improvement compared to state-of-the-art approaches.