AI-driven U.S. drought prediction using machine learning and deep learning

Tanrıverdi, İREM; Batmaz, İNCİ

doi:10.1007/s00382-025-07720-w

AI-driven U.S. drought prediction using machine learning and deep learning

Tanrıverdi İ., Batmaz İ.

Climate Dynamics, cilt.63, sa.249, ss.1-24, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 63 Sayı: 249
Basım Tarihi: 2025
Doi Numarası: 10.1007/s00382-025-07720-w
Dergi Adı: Climate Dynamics
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.1-24
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The importance of predicting drought, a significant environmental and socio-economic challenge, cannot be overstated. This research undertakes an extensive examination of drought phenomena, integrating two intricate datasets that detail weather and soil conditions across various counties in the United States (U.S.). We analyze data across the U.S. from 2000 to 2020, comparing algorithms that account for spatio-temporal structures. Specifically, we evaluate Gradient Boosting Machines techniques such as XGBoost, LightGBM, and CatBoost. Additionally, we develop deep learning (DL) models, including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Transformer architectures. Hybrid models, such as CNN-LSTM and Attention-LSTM, are also explored. For ensemble methods, we implement stacking and voting classifiers to enhance robustness and performance. These models were chosen as they effectively capture spatio-temporal structures in the data, making them particularly suitable for drought prediction. We also perform clustering to divide the data into homogeneous regions and analyze feature importance using both Shapley Additive Explanations (SHAP) values and impurity-based methods derived from tree-based models (e.g., XGBoost, LightGBM, CatBoost), in order to identify the most predictive features of drought scores within each region. To the best of our knowledge and based on a comprehensive review of existing literature, our study is pioneering in utilizing these advanced methodologies for an in-depth analysis of drought score data in the U.S. Our findings indicate model accuracies ranging from 0.5938 to 0.9783, with the highest performance achieved by the Attention-LSTM Hybrid Model and the lowest by XGBoost, although this increased accuracy comes at the cost of significantly longer computation time for DL models.

Keywords: Clustering, LSTM, CNN, Transformer, Hybrid and ensemble methods, Feature importance