BOFRF: A Novel Boosting-Based Federated Random Forest Algorithm on Horizontally Partitioned Data


Creative Commons License

Gencturk M., SINACI A. A., Cicekli N. K.

IEEE ACCESS, cilt.10, ss.89835-89851, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 10
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1109/access.2022.3202008
  • Dergi Adı: IEEE ACCESS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.89835-89851
  • Anahtar Kelimeler: Data models, Random forests, Predictive models, Classification algorithms, Collaborative work, Boosting, Prediction algorithms, Machine learning, Privacy, Collaborative work, Ensemble learning, federated learning, machine learning, privacy-preservation, random forest classification
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The application of federated learning on ensemble methods is a common practice with the goal of increasing the predictive power of local models. However, although existing federated solutions utilizing ensemble methods can achieve this when the datasets of sites are balanced and of good quality, i.e., the local models are already above a certain accuracy threshold, they usually fail to provide the same level of improvement to the models of sites that have an unsuccessful classifier because of their poor quality or imbalanced data. To address this challenge, we propose a novel federated ensemble classification algorithm for horizontally partitioned data, namely Boosting-based Federated Random Forest (BOFRF), which not only increases the predictive power of all participating sites, but also provides significantly high improvement on the predictive power of sites having unsuccessful local models. We implement a federated version of random forest, which is a well-known bagging algorithm, by adapting the idea of boosting to it. We introduce a novel aggregation and weight calculation methodology that assigns weights to local classifiers based on their classification performance at each site without increasing the communication or computation cost. We evaluate the performance of our proposed algorithm in different federated environments that we set up by using four healthcare datasets. The empirical results show that BOFRF improves the predictive power of local random forest models in all cases. The advantage of BOFRF is that the level of improvement it provides for sites having unsuccessful local models is significantly high unlike existing solutions.