T-MPP: A Novel Topic-Driven Meta-path-Based Approach for Co-authorship Prediction in Large-Scale Content-Based Heterogeneous Bibliographic Network in Distributed Computing Framework by Spark


Phuc Do P. D. , Phu Pham P. P. , Trung Phan T. P. , Thuc Nguyen T. N.

1st International Conference on Intelligent Computing and Optimization (ICO), Pattaya, Thailand, 4 - 05 October 2018, vol.866, pp.87-97 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 866
  • Doi Number: 10.1007/978-3-030-00979-3_9
  • City: Pattaya
  • Country: Thailand
  • Page Numbers: pp.87-97
  • Middle East Technical University Affiliated: No

Abstract

Recently, heterogeneous network mining has gained tremendous attention from researcher due to its wide applications. Link prediction is one of the most important task in information network mining. From the past, most of the networked data mining approaches are mainly applied for homogenous network which is considered as single-typed objects and links. Moreover, there are remained challenges related to thoroughly evaluating the content of linked objects which are considered as important in predicting the potential relationships between objects. Like a common problem of predicting co-authorship in bibliographic network such as: DBLP, DBIS, etc. There is no doubt that an author who is interesting in "data mining" field tend to cooperate with the other authors who contribute on this field only. Hence, predicting co-authorships between authors work on "data mining" with others who work on "hardware" is dull as well. Moreover, in the context of large-scaled network, traditional standalone computing mechanism also is not affordable due to low-performance in time-consuming. To overcome these challenges, n this paper, we propose an approach of topic-driven meta-path-based prediction in heterogeneous network, called T-MPP which is implemented on distributed computing environment of Spark. The T-MPP not only enables to discover potential relationships in given bibliographic network but also supports to capture the topic similarity between authors. We present experiments on a real-world DBLP network. The outputs show that our proposed T-MPP model can generate more accurate prediction results as compared to previous approaches.