Forecasting Performance of Machine Learning, Time Series and Hybrid Methods for Low and High Frequency Time Series

Özdemir O., Yozgatlıgil C.

STATISTICA NEERLANDICA, vol.00390402, no.78 (2), pp.441-474, 2024 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 00390402 Issue: 78 (2)
  • Publication Date: 2024
  • Doi Number: 10.1111/stan.12326
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Business Source Elite, Business Source Premier, EconLit, zbMATH
  • Page Numbers: pp.441-474
  • Middle East Technical University Affiliated: Yes


One of the main objectives of the time series analysis is forecasting, so both Machine Learning methods and statistical methods have been proposed in the literature. In this study, we compare the forecasting performance of some of these approaches. In addition to traditional forecasting methods, which are the Naive and Seasonal Naive Methods, S/ARIMA, Exponential Smoothing, TBATS, Bayesian Exponential Smoothing Models with Trend Modifications and STL Decomposition, the forecasts are also obtained using seven different machine learning methods, which are Random Forest, Support Vector Regression, XGBoosting, BNN, RNN, LSTM, and FFNN, and the hybridization of both statistical time series and machine learning methods. The data set is selected proportionally from various time domains in M4 Competition data set. Thereby, we aim to create a forecasting guide by considering different preprocessing approaches, methods, and data sets having various time domains. After the experiment, the performance and impact of all methods are discussed. Therefore, most of the best models are mainly selected from machine learning methods for forecasting. Moreover, the forecasting performance of the model is affected by both the time frequency and forecast horizon. Lastly, the study suggests that the hybrid approach is not always the best model for forecasting. Hence, this study provides guidelines to understand which method will perform better at different time series frequencies.