Molecular contrastive learning with graph attention network (MoCL-GAT) for enhanced molecular representation


Dalkıran A., RİFAİOĞLU A. S., ATALAY R., ACAR A. C., DOĞAN T., ATALAY M. V.

BMC Bioinformatics, cilt.27, sa.1, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 27 Sayı: 1
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1186/s12859-026-06409-z
  • Dergi Adı: BMC Bioinformatics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Compendex, INSPEC, MEDLINE, Directory of Open Access Journals
  • Anahtar Kelimeler: Artificial intelligence, Deep learning, Drug discovery, Graph theory, Machine learning
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Background: Learning the representation of molecules is crucial for drug discovery but is often hindered by the scarcity of labeled experimental data, which limits the performance of supervised machine learning models. While self-supervised learning (SSL) offers a solution by leveraging vast unlabeled chemical databases, many existing methods focus on learning from either local structural information or global molecular properties, but not both simultaneously. We introduce MoCL-GAT, a novel contrastive and transfer learning-based SSL framework that addresses this gap by simultaneously learning from two complementary objectives. It combines a local contrastive task on molecular subgraphs to capture fine-grained chemical environments with a global predictive task to learn holistic molecular descriptors. This dual-objective approach, powered by a Graph Attention Network, is designed to create more robust, versatile, and transferable molecular representations. Results: Pre-trained on 1.9 million compounds, MoCL-GAT was fine-tuned on diverse benchmarks. It achieved state-of-the-art performance on molecular property prediction tasks, with an AUROC of 0.928 on BBBP and 0.768 on SIDER, and top-ranking RMSEs of 0.570 for ESOL and 1.818 for FreeSolv. Critically, fine-tuned models consistently and significantly outperformed models trained from scratch, confirming the value of pre-training. Conclusions: These results validate that MoCL-GAT's dual-objective approach learns highly effective and transferable representations, enabling more accurate and data-efficient predictions for key cheminformatics challenges. The source code for MoCL-GAT is publicly available on Zenodo at https://doi.org/10.5281/zenodo.16927285.