A computational approach to nonparametric regression: bootstrapping CMARS method

Yazici, Ceyda; Yerlikaya-Ozkurt, Fatma; BATMAZ, İNCİ

doi:10.1007/s10994-015-5502-3

A computational approach to nonparametric regression: bootstrapping CMARS method

Yazici C., Yerlikaya-Ozkurt F., BATMAZ İ.

MACHINE LEARNING, cilt.101, ss.211-230, 2015 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 101
Basım Tarihi: 2015
Doi Numarası: 10.1007/s10994-015-5502-3
Dergi Adı: MACHINE LEARNING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.211-230
Anahtar Kelimeler: Bootstrapping regression, Conic multivariate adaptive regression splines, Fixed-X resampling, Random-X resampling, Wild bootstrap, Machine learning, SPLINES, MODELS, TREES
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Bootstrapping is a computer-intensive statistical method which treats the data set as a population and draws samples from it with replacement. This resampling method has wide application areas especially in mathematically intractable problems. In this study, it is used to obtain the empirical distributions of the parameters to determine whether they are statistically significant or not in a special case of nonparametric regression, conic multivariate adaptive regression splines (CMARS), a statistical machine learning algorithm. CMARS is the modified version of the well-known nonparametric regression model, multivariate adaptive regression splines (MARS), which uses conic quadratic optimization. CMARS is at least as complex as MARS even though it performs better with respect to several criteria. To achieve a better performance of CMARS with a less complex model, three different bootstrapping regression methods, namely, random-X, fixed-X and wild bootstrap are applied on four data sets with different size and scale. Then, the performances of the models are compared using various criteria including accuracy, precision, complexity, stability, robustness and computational efficiency. The results imply that bootstrap methods give more precise parameter estimates although they are computationally inefficient and that among all, random-X resampling produces better models, particularly for medium size and scale data sets.