A cluster tree based model selection approach for logistic regression classifier


Tanju O., KALAYLIOĞLU AKYILDIZ Z. I.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, cilt.88, sa.7, ss.1394-1414, 2018 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 88 Sayı: 7
  • Basım Tarihi: 2018
  • Doi Numarası: 10.1080/00949655.2018.1437442
  • Dergi Adı: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.1394-1414
  • Anahtar Kelimeler: Model selection, logistic regression, classification, clustering similarity measures, INFORMATION CRITERION, NONLINEAR-REGRESSION, ALGORITHMS
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Model selection methods are important to identify the best approximating model. To identify the best meaningful model, purpose of the model should be clearly pre-stated. The focus of this paper is model selection when the modelling purpose is classification. We propose a new model selection approach designed for logistic regression model selection where main modelling purpose is classification. The method is based on the distance between the two clustering trees. We also question and evaluate the performances of conventional model selection methods based on information theory concepts in determining best logistic regression classifier. An extensive simulation study is used to assess the finite sample performances of the cluster tree based and the information theoretic model selection methods. Simulations are adjusted for whether the true model is in the candidate set or not. Results show that the new approach is highly promising. Finally, they are applied to a real data set to select a binary model as a means of classifying the subjects with respect to their risk of breast cancer.