DTA-GNN: a toolkit for constructing target-specific drug–target affinity datasets and training graph neural networks


Özsari G., RİFAİOĞLU A. S., ACAR A. C., DOĞAN T., ATALAY M. V.

SoftwareX, cilt.34, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 34
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.softx.2026.102671
  • Dergi Adı: SoftwareX
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: Cheminformatics, Data leakage, Drug–target binding affinity prediction, Graph neural networks, Reproducible research
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Drug–target affinity (DTA) prediction is a key task in computational drug discovery, yet current research is often compromised by data leakage and non-reproducible preprocessing. We present DTA-GNN, an end-to-end Python toolkit that automates the rigorous construction of target-specific datasets and streamlines the training of Graph Neural Network (GNN) based DTA predictors. To address data validity, the toolkit’s dataset construction pipeline handles ChEMBL data ingestion and unit standardization, and implements scaffold- and temporal-splitting strategies to prevent overestimation of performance. Integrated leakage audits quantify split integrity prior to modeling. Following dataset construction, DTA-GNN provides a modular trainer that supports ten state-of-the-art GNN architectures and includes built-in hyperparameter optimization. In addition, DTA-GNN supports latent space analysis either by extracting learned molecular embeddings or leveraging molecular fingerprints, and provides interactive visualizations to explore chemical space and interpret model behavior. By unifying robust dataset construction with accessible model training and latent-space analysis via Python library, CLI, and Web UI, DTA-GNN enables researchers to produce standardized, reproducible, and leakage-free DTA benchmarks.