Sentiment-Focused Web Crawling

Vural A. G., Cambazoglu B. B., Karagöz P.

ACM TRANSACTIONS ON THE WEB, vol.8, 2014 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 8
  • Publication Date: 2014
  • Doi Number: 10.1145/2644821
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Keywords: Sentiment analysis, focused web crawling, STRENGTH DETECTION
  • Middle East Technical University Affiliated: Yes


Sentiments and opinions expressed in Web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. In the last decade, the analysis of such content has gained importance due to its high potential for monetization. Despite the vast interest in sentiment analysis, somewhat surprisingly, the discovery of sentimental or opinionated Web content is mostly ignored. This work aims to fill this gap and addresses the problem of quickly discovering and fetching the sentimental content present in the Web. To this end, we design a sentiment-focused Web crawling framework. In particular, we propose different sentiment-focused Web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores. Through simulations, these strategies are shown to achieve considerable performance improvement over general-purpose Web crawling strategies in discovery of sentimental Web content.