Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package

Griffiths, Emma J., Timme, Ruth E, Mendes, Catarina Inês, Page, Andrew J., Alikhan, Nabil-Fareed, Fornika, Dan, Maguire, Finlay, Campos, Josefina, Park, Daniel, Olawoye, Idowu B., Oluniyi, Paul E, Anderson, Dominique, Christoffels, Alan, da Silva, Anders Gonçalves, Cameron, Rhiannon, Dooley, Damion, Katz, Lee S., Black, Allison, Karsch-Mizrachi, Ilene, Barrett, Tanya, Johnston, Anjanette, Connor, Thomas R. ORCID:, Nicholls, Samuel M., Witney, Adam A, Tyson, Gregory H., Tausch, Simon H., Raphenya, Amogelang R., Alcock, Brian, Aanensen, David M., Hodcroft, Emma, Hsiao, William W. L., Vasconcelos, Ana Tereza R. and MacCannell, Duncan R. 2022. Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package. GigaScience 11 , giac003. 10.1093/gigascience/giac003

[thumbnail of future-proofing_gigascience.pdf]
PDF - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview


Background The Public Health Alliance for Genomic Epidemiology (PHA4GE) ( is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. Results As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. Conclusions Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI’s BioSample database.

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Biosciences
Additional Information: This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
Publisher: Oxford University Press
ISSN: 2047-217X
Date of First Compliant Deposit: 28 February 2022
Date of Acceptance: 7 January 2022
Last Modified: 19 May 2023 04:24

Citation Data

Cited 3 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics