Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison

Soysal M., SCHMİDT Ş. E.

PERFORMANCE EVALUATION, vol.67, no.6, pp.451-467, 2010 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 67 Issue: 6
  • Publication Date: 2010
  • Doi Number: 10.1016/j.peva.2010.01.001
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.451-467
  • Keywords: Traffic classification, Privacy-preserving classification, Supervised machine learning, Data set composition, Comparison
  • Middle East Technical University Affiliated: Yes


The task of network management and monitoring relies on an accurate characterization of network traffic generated by different applications and network protocols. We employ three supervised machine learning (ML) algorithms, Bayesian Networks, Decision Trees and Multilayer Perceptrons for the flow-based classification of six different types of Internet traffic including peer-to-peer (P2P) and content delivery (Akamai) traffic. The dependency of the traffic classification performance on the amount and composition of training data is investigated followed by experiments that show that ML algorithms such as Bayesian Networks and Decision Trees are suitable for Internet traffic flow classification at a high speed, and prove to be robust with respect to applications that dynamically change their source ports. Finally, the importance of correctly classified training instances is highlighted by an experiment that is conducted with wrongly labeled training data. (C) 2010 Elsevier B.V. All rights reserved.