A front-page news-selection algorithm based on topic modelling using raw text


Toraman Ç., Can F.

JOURNAL OF INFORMATION SCIENCE, cilt.41, sa.5, ss.676-685, 2015 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 41 Sayı: 5
  • Basım Tarihi: 2015
  • Doi Numarası: 10.1177/0165551515589069
  • Dergi Adı: JOURNAL OF INFORMATION SCIENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
  • Sayfa Sayıları: ss.676-685
  • Orta Doğu Teknik Üniversitesi Adresli: Hayır

Özet

Front-page news selection is the task of finding important news articles in news aggregators. In this study, we examine news selection for public front pages using raw text, without any meta-attributes such as click counts. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then we select important documents from important topics using a priority-based method that helps in fitting news content into the length of the front page. A user study is subsequently conducted to measure effectiveness and diversity, using our newly-generated annotation program. Annotation results show that up to seven of 10 news articles are important and up to nine of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research.