Learning to rank web data using multivariate adaptive regression splines


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Fen Edebiyat Fakültesi, İstatistik Bölümü, Türkiye

Tezin Onay Tarihi: 2018

Öğrenci: GÜLŞAH ALTINOK

Eş Danışman: PINAR KARAGÖZ, İNCİ BATMAZ

Özet:

A new trend, called learning to rank, has recently come to light in a wide variety of applications in Information Retrieval (IR), Natural Language Processing (NLP), and Data Mining (DM), to utilize machine learning techniques to automatically build the ranking models. Typical applications are document retrieval, expert search, definition search, collaborative filtering, question answering, and machine translation. In IR, there are three approaches used for ranking. The one is traditional model approaches such as Boolean Model (BM), Vector Space Model (VSM) and classical Probabilistic Model (classical PM). The second approach is called Language Model (LM). Such models are n-gram Model, Query Likelihood Model (QLM). The final method is namely system model including Support Vector Model (SVM) and Artificial Neural Network (ANN). In this study, we adopted the system model approach and compared the performance measures of those widely used models, SVM and ANN with those Multivariate Adaptive Regression Splines (MARS) and its variant Conic Multivariate Adaptive Regression Splines (CMARS). Results indicate that MARS performs slightly better than the others considered in this study