On initial population generation in feature subset selection

Deniz, Ayca; Kiziloz, Hakan

doi:10.1016/j.eswa.2019.06.063

On initial population generation in feature subset selection

Atıf İçin Kopyala

Deniz A., Kiziloz H. E.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.137, ss.11-21, 2019 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 137
Basım Tarihi: 2019
Doi Numarası: 10.1016/j.eswa.2019.06.063
Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.11-21
Anahtar Kelimeler: Feature subset selection, Initial population, Multiobjective optimization, GENETIC ALGORITHM
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Performance of evolutionary algorithms depends on many factors such as population size, number of generations, crossover or mutation probability, etc. Generating the initial population is one of the important steps in evolutionary algorithms. A poor initial population may unnecessarily increase the number of searches or it may cause the algorithm to converge at local optima. In this study, we aim to find a promising method for generating the initial population, in the Feature Subset Selection (FSS) domain. FSS is not considered as an expert system by itself, yet it constitutes a significant step in many expert systems. It eliminates redundancy in data, which decreases training time and improves solution quality. To achieve our goal, we compare a total of five different initial population generation methods; Information Gain Ranking (IGR), greedy approach and three types of random approaches. We evaluate these methods using a specialized Teaching Learning Based Optimization searching algorithm (MTLBO-MD), and three supervised learning classifiers: Logistic Regression, Support Vector Machines, and Extreme Learning Machine. In our experiments, we employ 12 publicly available datasets, mostly obtained from the well-known UCI Machine Learning Repository. According to their feature sizes and instance counts, we manually classify these datasets as small, medium, or large-sized. Experimental results indicate that all tested methods achieve similar solutions on small-sized datasets. For medium-sized and large-sized datasets, however, the IGR method provides a better starting point in terms of execution time and learning performance. Finally, when compared with other studies in literature, the IGR method proves to be a viable option for initial population generation. (C) 2019 Elsevier Ltd. All rights reserved.