Comparison of missing value imputation methods for meteorological time series data


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Fen Edebiyat Fakültesi, İstatistik Bölümü, Türkiye

Tezin Onay Tarihi: 2010

Öğrenci: SİPAN ASLAN

Danışman: CEYLAN YOZGATLIGİL

Özet:

Dealing with missing data in spatio-temporal time series constitutes important branch of general missing data problem. Since the statistical properties of time-dependent data characterized by sequentiality of observations then any interruption of consecutiveness in time series will cause severe problems. In order to make reliable analyses in this case missing data must be handled cautiously without disturbing the series statistical properties, mainly as temporal and spatial dependencies. In this study we aimed to compare several imputation methods for the appropriate completion of missing values of the spatio-temporal meteorological time series. For this purpose, several missing imputation methods are assessed on their imputation performances for artificially created missing data in monthly total precipitation and monthly mean temperature series which are obtained from the climate stations of Turkish State Meteorological Service. Artificially created missing data are estimated by using six methods. Single Arithmetic Average (SAA), Normal Ratio (NR) and NR Weighted with Correlations (NRWC) are the three simple methods used in the study. On the other hand, we used two computational intensive methods for missing data imputation which are called Multi Layer Perceptron type Neural Network (MLPNN) and Monte Carlo Markov Chain based on Expectation-Maximization Algorithm (EM-MCMC). In addition to these, we propose a modification in the EM-MCMC method in which results of simple imputation methods are used as auxiliary variables. Beside the using accuracy measure based on squared errors we proposed Correlation Dimension (CD) technique for appropriate evaluation of imputation performances which is also important subject of Nonlinear Dynamic Time Series Analysis.