An explainable two-stage machine learning approach for precipitation forecast


Senocak A. U. G., YILMAZ M. T., KALKAN S., YÜCEL İ., Amjad M.

Journal of Hydrology, cilt.627, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 627
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1016/j.jhydrol.2023.130375
  • Dergi Adı: Journal of Hydrology
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), Artic & Antarctic Regions, BIOSIS, CAB Abstracts, Communication Abstracts, Compendex, Environment Index, INSPEC, Metadex, Pollution Abstracts, Veterinary Science Database, Civil Engineering Abstracts
  • Anahtar Kelimeler: Hydrometeorological hyperparameter optimization, Instance-level model explanation, Model wide parameter and feature importance, Precipitation intensity classification, Quantitative precipitation forecast, two-stage ML-based NWP precipitation forecast merging
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

A common post-processing approach to improve precipitation forecasts is to use machine learning models such as artificial neural networks (more specifically, multi-layer perceptrons) as black-box systems. These models utilize different sources of observations or predictors to generate an improved forecast in terms of desired metrics. However, most existing studies employ a single-stage regression model without considering explainability. The small number of studies with two-stage models that combine classification and regression utilize binary classification and still lack explainable artificial intelligence. Therefore, this study proposes a precipitation prediction system which (i) is composed of two stages for better predictions, (ii) compares the utility of binary and multi-class classification over the regression, and (iii) is explainable, unlike prior studies, in that individual predictions of machine learning-based forecasts are interpretable by humans. The proposed two-stage model first estimates the precipitation intensity category using binary or multi-class classification as the first stage and later utilizes precipitation intensity category information in a regression model, which is the second stage, to obtain daily precipitation magnitude. The utilized approach is made humanly interpretable (i.e., explainable) by providing insight into the model-wide importance of predictors and generation processes of the individual predictions (instance-level explanation). The proposed two-stage approach is compared against single-stage and black-box approaches in terms of prediction quality and explainability, where daily station-based observations are used as ground truth datasets. Experiments show that the proposed two-stage approach yields significant improvement (on average, RMSE reduced by 10.50%, and the correlation between numerical precipitation estimates and observed precipitation values increased by 7.5%) compared to the best-performing physical predictor (ECMWF). Analysis of explainability provides insights into the decisions of our two-stage approach, e.g., the usefulness of seasonality-related parameters, multi-class precipitation intensity classification as a first stage, and the predictors for each task (regression or classification).