Sustainable science mapping: benchmarking green AI against transformers for cross-disciplinary abstract classification using arXiv


Erkan M. A., Yozgatlıgil C.

SCIENTIFIC REPORTS, cilt.16, ss.1-23, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1038/s41598-026-48795-7
  • Dergi Adı: SCIENTIFIC REPORTS
  • Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), BIOSIS, Chemical Abstracts Core, MEDLINE, Directory of Open Access Journals
  • Sayfa Sayıları: ss.1-23
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Abstract The exponential growth of scholarly literature necessitates automated, scalable systems for organizing knowledge domains. However, text classification of academic abstracts presents distinct challenges due to specialized terminology and diverse discourse structures across disciplines. This study proposes a resource efficient deep learning methodology to categorize academic abstracts, scaling from coarse grained domains (arXiv) to fine grained disciplinary hierarchies (Web of Science). Systematic comparative analysis of Recurrent Neural Networks (Attention-GRU) and Transformer based architectures (BERT, SciBERT) are conducted, specifically focusing on the trade-off between predictive accuracy and computational efficiency. Extensive experiments on massive benchmarks that include the WOS-46985 dataset with 134 sub-disciplines, reveal a notable finding: Our proposed Attention-based GRU model utilizing static GloVe embeddings achieved a Macro-F1 score of 0.920, achieving higher performance than leading domain specific models such as SciBERT (F1: 0.867). Furthermore, this accuracy was achieved with over 3 $$\times$$ faster training times and significantly lower estimated energy proxy compared to Transformer variants. This research contributes to the field by providing a systematic evaluation of “Green AI” architectures, demonstrating that computationally efficient models can robustly handle the linguistic diversity of high cardinality, fine grained scientific taxonomies without the prohibitive estimated energy costs of Large Language Models.