Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution


Engin M., Can T.

24th International Symposium on Computer and Information Sciences, Güzelyurt, Kıbrıs (Kktc), 14 - 16 Eylül 2009, ss.105-110 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/iscis.2009.5291861
  • Basıldığı Şehir: Güzelyurt
  • Basıldığı Ülke: Kıbrıs (Kktc)
  • Sayfa Sayıları: ss.105-110
  • Anahtar Kelimeler: Text Classification, Data Mining, Machine Learning, Artificial Intelligence, Information Retrieval, World Wide Web
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply several Support Vector Machine (SVM) configurations on these datasets. We conduct our study considering the performance needs of practical context sensitive systems. Our results show that linear kernel classifiers achieve the best performance in terms of accuracy and speed on text documents expressed as keyword root features.