Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques


Deniz A., Kiziloz H. E., Dokeroglu T., COŞAR A.

NEUROCOMPUTING, cilt.241, ss.128-146, 2017 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 241
  • Basım Tarihi: 2017
  • Doi Numarası: 10.1016/j.neucom.2017.02.033
  • Dergi Adı: NEUROCOMPUTING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.128-146
  • Anahtar Kelimeler: Multiobjective feature selection, Evolutionary algorithm, Binary classification, Supervised/unsupervised machine learning, OPTIMIZATION
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed during the feature evaluation and classification phases, also possibly reducing the accuracy of the results. Therefore, selecting the minimum number of features while preserving and/or increasing the accuracy of the results at a high level becomes an important issue for achieving fast and accurate binary classification. Our multiobjective evolutionary algorithm includes two phases, FSS using a GA and applying ML techniques for the BCP. Since exhaustively investigating all of the feature subsets is intractable, a GA is preferred for the first phase of the algorithm for intelligently detecting the most appropriate feature subset. The GA uses multiobjective crossover and mutation operators to improve a population of individuals (each representing a selected feature subset) and obtain (near-) optimal solutions through generations. In the second phase of the algorithms, the fitness of the selected subset is decided by using state-of-the-art ML techniques; Logistic Regression, Support Vector Machines, Extreme Learning Machine, K-means, and Affinity Propagation. The performance of the multiobjective evolutionary algorithm (and the ML techniques) is evaluated with comprehensive experiments and compared with state-of-the-art algorithms, Greedy Search, Particle Swarm Optimization, Tabu Search, and Scatter Search. The proposed algorithm was observed to be robust and it performed better than the existing methods on most of the datasets. (C) 2017 Elsevier B.V. All rights reserved.