Boosting Initial Population in Multiobjective Feature Selection with Knowledge-Based Partitioning


Creative Commons License

Deniz A., Kiziloz H. E.

2022 International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, 18 - 23 July 2022, vol.2022-July identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 2022-July
  • Doi Number: 10.1109/ijcnn55064.2022.9892123
  • City: Padua
  • Country: Italy
  • Keywords: binary classification, evolutionary computation, feature selection, initial population, multiobjective optimization
  • Middle East Technical University Affiliated: Yes

Abstract

© 2022 IEEE.The quality of features is one of the main factors that affect classification performance. Feature selection aims to remove irrelevant and redundant features from data in order to increase classification accuracy. However, identifying these features is not a trivial task due to a large search space. Evolutionary algorithms have been proven to be effective in many optimization problems, including feature selection. These algorithms require an initial population to start their search mechanism, and a poor initial population may cause getting stuck in local optima. Diversifying the initial population is known as an effective approach to overcome this issue; yet, it may not suffice as the search space grows exponentially with increasing feature sizes. In this study, we propose an enhanced initial population strategy to boost the performance of the feature selection task. In our proposed method, we ensure the diversity of the initial population by partitioning the candidate solutions according to their selected number of features. In addition, we adjust the chances of features being selected into a candidate solution regarding their information gain values, which enables wise selection of features among a vast search space. We conduct extensive experiments on many benchmark datasets retrieved from UCI Machine Learning Repository. Moreover, we apply our algorithm on a real-world, large-scale dataset, i.e., Stanford Sentiment Treebank. We observe significant improvements after the comparisons with three off-the-shelf initialization strategies.