Domain adaptation on graphs by learning graph topologies: theoretical analysis and an algorithm


Vural E.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.27, ss.1619-1635, 2019 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 27 Konu: 3
  • Basım Tarihi: 2019
  • Doi Numarası: 10.3906/elk-1812-124
  • Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Sayfa Sayıları: ss.1619-1635

Özet

Traditional machine learning algorithms assume that the training and test data have the same distribution, while this assumption does not necessarily hold in real applications. Domain adaptation methods take into account the deviations in data distribution. In this work, we study the problem of domain adaptation on graphs. We consider a source graph and a target graph constructed with samples drawn from data manifolds. We study the problem of estimating the unknown class labels on the target graph using the label information on the source graph and the similarity between the two graphs. We particularly focus on a setting where the target label function is learned such that its spectrum is similar to that of the source label function. We first propose a theoretical analysis of domain adaptation on graphs and present performance bounds that characterize the target classification error in terms of the properties of the graphs and the data manifolds. We show that the classification performance improves as the topologies of the graphs get more balanced, i.e. as the numbers of neighbors of different graph nodes become more proportionate, and weak edges with small weights are avoided. Our results also suggest that graph edges between too distant data samples should be avoided for good generalization performance. We then propose a graph domain adaptation algorithm inspired by our theoretical findings, which estimates the label functions while learning the source and target graph topologies at the same time. The joint graph learning and label estimation problem is formulated through an objective function relying on our performance bounds, which is minimized with an alternating optimization scheme. Experiments on synthetic and real data sets suggest that the proposed method outperforms baseline approaches.