Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution

Engin M., Can T.

24th International Symposium on Computer and Information Sciences, Güzelyurt, Kıbrıs (Kktc), 14 - 16 Eylül 2009, ss.105-110, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/iscis.2009.5291861
Basıldığı Şehir: Güzelyurt
Basıldığı Ülke: Kıbrıs (Kktc)
Sayfa Sayıları: ss.105-110
Anahtar Kelimeler: Text Classification, Data Mining, Machine Learning, Artificial Intelligence, Information Retrieval, World Wide Web
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply several Support Vector Machine (SVM) configurations on these datasets. We conduct our study considering the performance needs of practical context sensitive systems. Our results show that linear kernel classifiers achieve the best performance in terms of accuracy and speed on text documents expressed as keyword root features.