Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Machine learning as a service for DiSSCo’s digital specimen architecture

Grieb, Jonas, Weiland, Claus, Hardisty, Alex ORCID: https://orcid.org/0000-0002-0767-4310, Addink, Wouter, Islam, Sharif, Younis, Sohaib and Schmidt, Marco 2021. Machine learning as a service for DiSSCo’s digital specimen architecture. Presented at: TDWG 2021, Virtual, 18-22 Oct 2021. Biodiversity Information Science and Standards. , vol.5 (e75634) Pensoft, 10.3897/biss.5.75634

[thumbnail of BISS_article_75634.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (78kB)

Abstract

International mass digitization efforts through infrastructures like the European Distributed System of Scientific Collections (DiSSCo), the US resource for Digitization of Biodiversity Collections (iDigBio), the National Specimen Information Infrastructure (NSII) of China, and Australia’s digitization of National Research Collections (NRCA Digital) make geo- and biodiversity specimen data freely, fully and directly accessible. Complementary, overarching infrastructure initiatives like the European Open Science Cloud (EOSC) were established to enable mutual integration, interoperability and reusability of multidisciplinary data streams including biodiversity, Earth system and life sciences. Natural Science Collections (NSC) are of particular importance for such multidisciplinary and internationally linked infrastructures, since they provide hard scientific evidence by allowing direct traceability of derived data (e.g., images, sequences, measurements) to physical specimens and material samples in NSC. To open up the large amounts of trait and habitat data and to link these data to digital resources like sequence databases (e.g., ENA), taxonomic infrastructures (e.g., GBIF) or environmental repositories (e.g., PANGAEA), proper annotation of specimen data with rich (meta)data early in the digitization process is required, next to bridging technologies to facilitate the reuse of these data. This was addressed in recent studies, where we employed computational image processing and artificial intelligence technologies (Deep Learning) for the classification and extraction of features like organs and morphological traits from digitized collection data (with a focus on herbarium sheets). However, such applications of artificial intelligence are rarely—this applies both for (sub-symbolic) machine learning and (symbolic) ontology-based annotations—integrated in the workflows of NSC’s management systems, which are the essential repositories for the aforementioned integration of data streams. This was the motivation for the development of a Deep Learning-based trait extraction and coherent Digital Specimen (DS) annotation service providing “Machine learning as a Service” (MLaaS) with a special focus on interoperability with the core services of DiSSCo, notably the DS Repository (nsidr.org) and the Specimen Data Refinery, as well as reusability within the data fabric of EOSC. Taking up the use case to detect and classify regions of interest (ROI) on herbarium scans, we demonstrate a MLaaS prototype for DiSSCo involving the digital object framework, Cordra, for the management of DS as well as instant annotation of digital objects with extracted trait features (and ROIs) based on the DS specification openDS.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Q Science > QH Natural history
Uncontrolled Keywords: FAIR Digital Object, Distributed System of Scientific Collections, plant organ detection, deep learning, region-based convolutional neural network, image annotation
Additional Information: This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0)
Publisher: Pensoft
Date of First Compliant Deposit: 20 October 2021
Date of Acceptance: 23 September 2021
Last Modified: 10 Dec 2022 02:26
URI: https://orca.cardiff.ac.uk/id/eprint/144389

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics