Sentiment-Focused Web Crawling


Vural A. G. , Cambazoglu B. B. , Karagöz P.

ACM TRANSACTIONS ON THE WEB, cilt.8, 2014 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 8
  • Basım Tarihi: 2014
  • Doi Numarası: 10.1145/2644821
  • Dergi Adı: ACM TRANSACTIONS ON THE WEB

Özet

Sentiments and opinions expressed in Web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. In the last decade, the analysis of such content has gained importance due to its high potential for monetization. Despite the vast interest in sentiment analysis, somewhat surprisingly, the discovery of sentimental or opinionated Web content is mostly ignored. This work aims to fill this gap and addresses the problem of quickly discovering and fetching the sentimental content present in the Web. To this end, we design a sentiment-focused Web crawling framework. In particular, we propose different sentiment-focused Web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores. Through simulations, these strategies are shown to achieve considerable performance improvement over general-purpose Web crawling strategies in discovery of sentimental Web content.