Event prediction from news text using subgraph embedding and graph sequence mining

Creative Commons License

Çekinel R. F., Karagöz P.

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, vol.25, no.6, pp.2403-2428, 2022 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 25 Issue: 6
  • Publication Date: 2022
  • Doi Number: 10.1007/s11280-021-01002-1
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
  • Page Numbers: pp.2403-2428
  • Keywords: Graph mining, Sequential rule mining, Frequent subgraph mining, Graph embeddings, News prediction
  • Middle East Technical University Affiliated: Yes


Event detection from textual content by using text mining concepts is a well-researched field in the literature. On the other hand, graph modeling and graph embedding techniques in recent years provide an opportunity to represent textual contents as graphs. Text can be enriched with additional attributes in graphs, and the complex relationships can be captured within the graphs. In this paper, we focus on news prediction and model the problem as subgraph prediction. More specifically, we aim to predict the news skeleton in the form of a subgraph. To this aim, graph-based representations of news articles are constructed and a graph mining based pattern extraction method is proposed. The proposed method consists of three main steps. Initially, graph representation of the news text is constructed. Afterwards, frequent subgraph mining and sequential rule mining algorithms are adapted for pattern prediction on graph sequences. We consider that a subgraph captures the main story of the contents, and the sequential rules indicate the subgraph patterns' temporal relationships. Finally, extracted sequential patterns are used for predicting the future news skeleton (i.e. main features of the news). In order to measure the similarity, graph embedding techniques are also employed. The proposed method is analyzed on both a collection of news from an online newspaper and on a benchmark news dataset against baseline methods.