Evolutionary Multiobjective Feature Selection for Sentiment Analysis

Deniz, Ayca; Angin, Merih; ANGIN, PELİN

doi:10.1109/access.2021.3118961

Evolutionary Multiobjective Feature Selection for Sentiment Analysis

Deniz A., Angin M., ANGIN P.

IEEE ACCESS, cilt.9, ss.142982-142996, 2021 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 9
Basım Tarihi: 2021
Doi Numarası: 10.1109/access.2021.3118961
Dergi Adı: IEEE ACCESS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.142982-142996
Anahtar Kelimeler: Feature extraction, Sentiment analysis, Task analysis, Machine learning, Analytical models, Measurement, Data mining, Binary classification, evolutionary computation, feature selection, multiobjective optimization, sentiment analysis, PARTICLE SWARM OPTIMIZATION, FEATURE SUBSET-SELECTION, CLASSIFICATION, ALGORITHM
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Sentiment analysis is one of the prominent research areas in data mining and knowledge discovery, which has proven to be an effective technique for monitoring public opinion. The big data era with a high volume of data generated by a variety of sources has provided enhanced opportunities for utilizing sentiment analysis in various domains. In order to take best advantage of the high volume of data for accurate sentiment analysis, it is essential to clean the data before the analysis, as irrelevant or redundant data will hinder extracting valuable information. In this paper, we propose a hybrid feature selection algorithm to improve the performance of sentiment analysis tasks. Our proposed sentiment analysis approach builds a binary classification model based on two feature selection techniques: an entropy-based metric and an evolutionary algorithm. We have performed comprehensive experiments in two different domains using a benchmark dataset, Stanford Sentiment Treebank, and a real-world dataset we have created based on World Health Organization (WHO) public speeches regarding COVID-19. The proposed feature selection model is shown to achieve significant performance improvements in both datasets, increasing classification accuracy for all utilized machine learning and text representation technique combinations. Moreover, it achieves over 70% reduction in feature size, which provides efficiency in computation time and space.