Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance

Knight, Louise ORCID: https://orcid.org/0000-0003-1431-197X, Stefanic, Polona, Cigale, Matej, Jones, Andrew and Taylor, Ian ORCID: https://orcid.org/0000-0001-5040-0772 2019. Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance. Future Generation Computer Systems 100 , pp. 542-556. 10.1016/j.future.2019.05.039

[thumbnail of KnightL_manuscript.pdf]
Preview
PDF - Accepted Post-Print Version
Download (522kB) | Preview

Abstract

SWITCH (Software Workbench for Interactive, Time Critical and Highly self-adaptive cloud applications) allows for the development and deployment of real-time applications in the cloud, but it does not yet support instances backed by Graphics Processing Units (GPUs). Wanting to explore how SWITCH might support CUDA (a GPU architecture) in the future, we have undertaken a review of time-critical CUDA applications, discovering that run-time requirements (which we call ‘wall time’) are in many cases regarded as the most important. We have performed experiments to investigate which parameters have the greatest impact on wall time when running multiple Amazon Web Services GPU-backed instances. Although a maximum of 8 single-GPU instances can be launched in a single Amazon Region, launching just 2 instances rather than 1 gives a 42% decrease in wall time. Also, instances are often wasted doing nothing, and there is a moderately-strong relationship between how problems are distributed across instances and wall time. These findings can be used to enhance the SWITCH provision for specifying Non-Functional Requirements (NFRs); in the future, GPU-backed instances could be supported. These findings can also be used more generally, to optimise the balance between the computational resources needed and the resulting wall time to obtain results.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA76 Computer software
Publisher: Elsevier
ISSN: 0167-739X
Funders: European Union
Date of First Compliant Deposit: 17 June 2019
Date of Acceptance: 15 May 2019
Last Modified: 12 Nov 2024 04:30
URI: https://orca.cardiff.ac.uk/id/eprint/123498

Citation Data

Cited 1 time in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics