Rostami-Tabar, Bahman ORCID: https://orcid.org/0000-0002-3730-0045 and Mircetic, Dejan 2023. Exploring the association between time series features and forecasting by temporal aggregation using machine learning. Neurocomputing 548 , 126376. 10.1016/j.neucom.2023.126376 |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (5MB) | Preview |
Abstract
When a forecast of the total value over several time periods ahead is required, forecasters are presented with two temporal aggregation (TA) approaches to produce required forecasts: i) aggregated forecast (AF) or ii) aggregate data using non-overlapping temporal aggregation (AD). Often, the recommendation is to aggregate data to a frequency relevant to the decision the eventual forecast will support and then produce the forecast. However, this might not be always the best choice and we argue that both AF and AD approaches may outperform each other in different situations. Moreover, there is a lack of evidence on what indicators may determine the superiority of each approach. We design and execute an empirical experiment framework to first explore the performance of these approaches using monthly time series of M4 competition dataset. We further turn the problem into a classification supervised learning by constructing a database consisting of features of each time series as predictor and model class labelled as AF/AD as response/outcome. We then build machine learning algorithms to investigate the association between time series features and the performance of AF and AD. Our findings suggest that both AF and AD approaches may not consistently generate accurate results for every individual series. AF is shown to be significantly better than AD for the monthly M4 time series, especially for longer horizons. We build several machine learning approaches using a set of extracted time series features as input to predict accurately whether AD or AF should be used. We find out that Random Forest (RF) is the most accurate approach in correctly classifying the outcome assessed both by statistical measures such as misclassification error, F-statistics, area under the curve, and a utility measure. The RF approach reveals that curvature, nonlinearity, seas_pacf, unitroot_pp, mean, ARCHM.LM, Coefficient of Variation, stability, linearity, and max_level_shif are among the most important features in driving the predictions of the model. Our findings indicate that the strength of trend, ARCH.LM, hurst, autocorrelation lag 1, unitroot_pp, and seas_pacf may favour AF approach, while lumpiness, entropy, nonlinearity, curvature, and strength of seasonality may increase the chance of AD performing bett
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Business (Including Economics) |
Subjects: | H Social Sciences > HA Statistics |
Publisher: | Elsevier |
ISSN: | 0925-2312 |
Date of First Compliant Deposit: | 27 June 2023 |
Date of Acceptance: | 22 May 2023 |
Last Modified: | 29 Jun 2023 19:46 |
URI: | https://orca.cardiff.ac.uk/id/eprint/160465 |
Actions (repository staff only)
Edit Item |