TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.27, sa.5, ss.3713-3728, 2019 (SCI-Expanded)
The prevalence and nonstop evolving technical sophistication of exploit kits (EKs) is one of the most challenging shifts in the modern cybercrime landscape. Over the last few years, malware infections via drive-by download attacks have been orchestrated with EK infrastructures. Malicious advertisements and compromised websites redirect victim browsers to web-based EK families that are assembled to exploit client-side vulnerabilities and finally deliver evil payloads. A key observation is that while the webpage contents have drastic differences between distinct intrusions executed through the same EK, the patterns in URL addresses stay similar. This is due to the fact that autogenerated URLs by EK platforms follow specific templates. This practice in use enables the development of an efficient system that is capable of classifying the responsible EK instances. This paper proposes novel URL features and a new technique to quickly categorize EK families with high accuracy using machine learning algorithms. Rather than analyzing each URL individually, the proposed overall URL patterns approach examines all URLs associated with an EK infection automatically. The method has been evaluated with a popular and publicly available dataset that contains 240 different real-world infection cases involving over 2250 URLs, the incidents being linked with the 4 major EK flavors that occurred throughout the year 2016. The system achieves up to 100% classification accuracy with the tested estimators.