On numerical optimization theory of infinite kernel learning

Ozogur-Akyuz S., WEBER G. W.

JOURNAL OF GLOBAL OPTIMIZATION, vol.48, no.2, pp.215-239, 2010 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 48 Issue: 2
  • Publication Date: 2010
  • Doi Number: 10.1007/s10898-009-9488-x
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.215-239
  • Keywords: Machine learning, Infinite kernel learning, Semi-infinite optimization, Infinite programming, Support vector machines, Continuous optimization, Discretization, Exchange method, Conceptual reduction, Triangulation
  • Middle East Technical University Affiliated: Yes


In Machine Learning algorithms, one of the crucial issues is the representation of the data. As the given data source become heterogeneous and the data are large-scale, multiple kernel methods help to classify "nonlinear data". Nevertheless, the finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, a novel method of "infinite" kernel combinations is proposed with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking at all infinitesimally fine convex combinations of the kernels from the infinite kernel set, the margin is maximized subject to an infinite number of constraints with a compact index set and an additional (Riemann-Stieltjes) integral constraint due to the combinations. After a parametrization in the space of probability measures, it becomes semi-infinite. We adapt well-known numerical methods to our infinite kernel learning model and analyze the existence of solutions and convergence for the given algorithms. We implement our new algorithm called "infinite" kernel learning (IKL) on heterogenous data sets by using exchange method and conceptual reduction method, which are well known numerical techniques from solve semi-infinite programming. The results show that our IKL approach improves the classifaction accuracy efficiently on heterogeneous data compared to classical one-kernel approaches.