TOP, cilt.18, sa.2, ss.377-395, 2010 (SCI-Expanded)
This paper introduces a model-based approach to the important data mining tool Multivariate adaptive regression splines (MARS), which has originally been organized in a more model-free way. Indeed, MARS denotes a modern methodology from statistical learning which is important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. It is very useful for high-dimensional problems and shows a great promise for fitting nonlinear multivariate functions. The MARS algorithm for estimating the model function consists of two algorithms, these are the forward and the backward stepwise algorithm. In our paper, we propose not to use the backward stepwise algorithm. Instead, we construct a penalized residual sum of squares for MARS as a Tikhonov regularization problem which is also known as ridge regression. We treat this problem using continuous optimization techniques which we consider to become an important complementary technology and model-based alternative to the concept of the backward stepwise algorithm. In particular, we apply the elegant framework of conic quadratic programming. This is an area of convex optimization which is very well-structured, herewith, resembling linear programming and, hence, permitting the use of powerful interior point methods. Based on these theoretical and algorithmical studies, this paper also contains an application to diabetes data. We evaluate and compare the performance of the established MARS and our new CMARS in classifying diabetic persons, where CMARS turns out to be very competitive and promising.