An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models

TAORMİNA, RİCCARDO; GALELLİ, STEFANO; KARAKAYA, GÜLŞAH; Ahipasaoglu, S.

doi:10.1016/j.jhydrol.2016.07.045

An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models

TAORMİNA R., GALELLİ S., KARAKAYA G., Ahipasaoglu S. D.

JOURNAL OF HYDROLOGY, cilt.542, ss.18-34, 2016 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 542
Basım Tarihi: 2016
Doi Numarası: 10.1016/j.jhydrol.2016.07.045
Dergi Adı: JOURNAL OF HYDROLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.18-34
Anahtar Kelimeler: Input variable selection, Information theory, Data-driven models, Extreme learning machines, Neural networks, INPUT VARIABLE SELECTION, ARTIFICIAL NEURAL-NETWORK, EXTREME LEARNING-MACHINE, WATER-RESOURCES, MUTUAL INFORMATION, RUNOFF, RELEVANCE, ENTROPY, OPTIMIZATION, UNCERTAINTY
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamfiow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors and associated models helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems. (C) 2016 Elsevier B.V. All rights reserved.