Sentiment-focused web crawling


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2013

Öğrenci: AVNİ GÜRAL VURAL

Danışman: PINAR KARAGÖZ

Özet:

The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of product reviews, user comments, testimonials, messages in discussion forums, status updates, and personal blogs as well as in other forms, including opinions in personal pages, news articles, and product descriptions. The analysis of sentimental content has a number of important applications, most important being web search, contextual advertisement, and recommendation. The timely discovery of sentimental content is important as most sentiments quickly lose their value if they are not immediately discovered. So far, all focused crawlers work in a topic-speci fic manner and fall short when sentimental pages are focused to be discovered. In addition, up to date, most of the research carried on sentiment analysis was focused on English language. In this thesis, we present a new perspective for focused web crawling. First, we propose a sentiment-focused web crawling framework to facilitate the quick discovery of sentimental content and evaluate it via simulations over the publicly available ClueWeb09-B web page collection. Second, we propose a framework for unsupervised sentiment analysis in Turkish and perform experiments with data from popular Turkish social media sites. Finally, we consolidate our frameworks and present a customized version of sentiment-focused web crawling framework for Turkish.