Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A case study into using common real-time workflow monitoring infrastructure for scientific workflows

Vahi, Karan, Harvey, Ian, Samak, Taghrid, Gunter, Daniel, Evans, Kieran ORCID: https://orcid.org/0000-0003-0414-0812, Rogers, David Mckendrick, Taylor, Ian James ORCID: https://orcid.org/0000-0001-5040-0772, Goode, Monte, Silva, Fabio, Al-Shakarchi, Eddie, Mehta, Gaurang, Deelman, Ewa and Jones, Andrew Cliffird 2013. A case study into using common real-time workflow monitoring infrastructure for scientific workflows. Journal of Grid Computing 11 (3) , pp. 381-406. 10.1007/s10723-013-9265-4

Full text not available from this repository.

Abstract

Scientific workflow systems support various workflow representations, operational modes, and configurations. Regardless of the system used, end users have common needs: to track the status of their workflows in real time, be notified of execution anomalies and failures automatically, perform troubleshooting, and automate the analysis of the workflow results. In this paper, we describe how the Stampede monitoring infrastructure was integrated with the Pegasus Workflow Management System and the Triana Workflow Systems, in order to add generic real time monitoring and troubleshooting capabilities across both systems. Stampede is an infrastructure that provides interoperable monitoring using a three-layer model: (1) a common data model to describe workflow and job executions; (2) high-performance tools to load workflow logs conforming to the data model into a data store; and (3) a common query interface. This paper describes the integration of Stampede monitoring architecture with Pegasus and Triana and shows the new analysis capabilities that Stampede provides to these workflow systems. The successful integration of Stampede with these workflow engines demonstrates the generic nature of the Stampede monitoring infrastructure and its potential to provide a common platform for monitoring across scientific workflow engines.

Item Type: Article
Status: Published
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Publisher: Springer
ISSN: 1570-7873
Last Modified: 18 May 2024 17:30
URI: https://orca.cardiff.ac.uk/id/eprint/92090

Citation Data

Cited 15 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item