Umer, Adnan, Mian, Adnan Noor and Rana, Omer ORCID: https://orcid.org/0000-0003-3597-2646 2023. Predicting machine behavior from Google cluster workload traces. Concurrency and Computation: Practice and Experience 35 (5) , e7559. 10.1002/cpe.7559 |
Preview |
PDF
- Accepted Post-Print Version
Download (623kB) | Preview |
Abstract
Data centers today host a number of computational resources to support the increasing demand for computation and storage. Understanding how these physical and virtual machines transition between different states of operation (referred to as machine lifecycle) enables more efficient data center operation management. Furthermore, it helps data center operators define policies on how new computational resources can be added or existing infrastructure decommissioned. Using Google cluster trace data set version 3 collected from approximately 96 k machines, we analyze machine failure and changes in machine lifecycle over time. We observed that there is a 13% chance of another machine failure under the same network switch within 1 min of the previous machine failure. A Markov chain-based model is proposed, that can predict machine states at any given time. Using the model and estimated probabilities, we predicted the machine state over a span of several days with a high probability. Using the predicted machine state, we reconstructed the active machines trend and compared this with the trend reported in the data set, observing an error of 1.76%.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Publisher: | Wiley |
ISSN: | 1532-0626 |
Date of First Compliant Deposit: | 11 December 2022 |
Date of Acceptance: | 21 November 2022 |
Last Modified: | 10 Dec 2023 05:39 |
URI: | https://orca.cardiff.ac.uk/id/eprint/154798 |
Citation Data
Cited 3 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |