Exploring the association between time series features and forecasting by temporal aggregation using machine learning

and Mircetic, Dejan 2023. Exploring the association between time series features and forecasting by temporal aggregation using machine learning. Neurocomputing 548 , 126376. 10.1016/j.neucom.2023.126376

[thumbnail of 1-s2.0-S092523122300499X-main.pdf]

Preview

PDF - Published Version
Available under License Creative Commons Attribution.
Download (5MB) | Preview

Official URL: https://doi.org/10.1016/j.neucom.2023.126376

Abstract

When a forecast of the total value over several time periods ahead is required, forecasters are presented with two temporal aggregation (TA) approaches to produce required forecasts: i) aggregated forecast (AF) or ii) aggregate data using non-overlapping temporal aggregation (AD). Often, the recommendation is to aggregate data to a frequency relevant to the decision the eventual forecast will support and then produce the forecast. However, this might not be always the best choice and we argue that both AF and AD approaches may outperform each other in different situations. Moreover, there is a lack of evidence on what indicators may determine the superiority of each approach. We design and execute an empirical experiment framework to first explore the performance of these approaches using monthly time series of M4 competition dataset. We further turn the problem into a classification supervised learning by constructing a database consisting of features of each time series as predictor and model class labelled as AF/AD as response/outcome. We then build machine learning algorithms to investigate the association between time series features and the performance of AF and AD. Our findings suggest that both AF and AD approaches may not consistently generate accurate results for every individual series. AF is shown to be significantly better than AD for the monthly M4 time series, especially for longer horizons. We build several machine learning approaches using a set of extracted time series features as input to predict accurately whether AD or AF should be used. We find out that Random Forest (RF) is the most accurate approach in correctly classifying the outcome assessed both by statistical measures such as misclassification error, F-statistics, area under the curve, and a utility measure. The RF approach reveals that curvature, nonlinearity, seas_pacf, unitroot_pp, mean, ARCHM.LM, Coefficient of Variation, stability, linearity, and max_level_shif are among the most important features in driving the predictions of the model. Our findings indicate that the strength of trend, ARCH.LM, hurst, autocorrelation lag 1, unitroot_pp, and seas_pacf may favour AF approach, while lumpiness, entropy, nonlinearity, curvature, and strength of seasonality may increase the chance of AD performing bett

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Business (Including Economics)
Subjects:	H Social Sciences > HA Statistics
Publisher:	Elsevier
ISSN:	0925-2312
Date of First Compliant Deposit:	27 June 2023
Date of Acceptance:	22 May 2023
Last Modified:	29 Jun 2023 19:46
URI:	https://orca.cardiff.ac.uk/id/eprint/160465

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)