Restructuring forward step of MARS algorithm using a new knot selection procedure based on a mapping approach

Koc E. K., İYİGÜN C.

JOURNAL OF GLOBAL OPTIMIZATION, vol.60, no.1, pp.79-102, 2014 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 60 Issue: 1
  • Publication Date: 2014
  • Doi Number: 10.1007/s10898-013-0107-5
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.79-102
  • Keywords: Data mining, Multivariate adaptive regression splines (MARS), Computational efficiency, High dimensional data reduction, Mapping, Self-organizing maps, ADAPTIVE REGRESSION SPLINES, PERFORMANCE
  • Middle East Technical University Affiliated: Yes


In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a popular nonparametric regression technique used to define the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear functions for local fit and apply an adaptive procedure to select the number and location of breaking points (called knots). The function estimation is basically generated via a two-stepwise procedure: forward selection and backward elimination. In the first step, a large number of local fits is obtained by selecting large number of knots via a lack-of-fit criteria; and in the latter one, the least contributing local fits or knots are removed. In conventional adaptive spline procedure, knots are selected from a set of all distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid this drawback, it is possible to restrict the knot points to a subset of data points. In this context, a new method is proposed for knot selection which bases on a mapping approach like self organizing maps. By this method, less but more representative data points are become eligible to be used as knots for function estimation in forward step of MARS. The proposed method is applied to many simulated and real datasets, and the results show that it proposes a time efficient forward step for the knot selection and model estimation without degrading the model accuracy and prediction performance.