Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

SemLinker: automating big data integration for casual users

Alrehamy, Hassan and Walker, Coral ORCID: 2018. SemLinker: automating big data integration for casual users. Journal Of Big Data 5 , 14. 10.1186/s40537-018-0123-x

[thumbnail of Alrehamy_et_al-2018-Journal_of_Big_Data.pdf]
PDF - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview


A data integration approach combines data from different sources and builds a unified view for the users. Big data integration inherently is a complex task, and the existing approaches are either potentially limited or invariably rely on manual inputs and interposition from experts or skilled users. SemLinker, an ontology-based data integration system, is part of a metadata management framework for personal data lake (PDL), a personal store-everything architecture. PDL is for casual and unskilled users, therefore SemLinker adopts an automated data integration workflow to minimize manual input requirements. To support the flat architecture of a lake, SemLinker builds and maintains a schema metadata level without involving any physical transformation of data during integration, preserving the data in their native formats while, at the same time, allowing them to be queried and analyzed. Scalability, heterogeneity, and schema evolution are big data integration challenges that are addressed by SemLinker. Large and real-world datasets of substantial heterogeneities are used in evaluating SemLinker. The results demonstrate and confirm the integration efficiency and robustness of SemLinker, especially regarding its capability in the automatic handling of data heterogeneities and schema evolutions.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Additional Information: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher: SpringerOpen
ISSN: 2196-1115
Funders: Cardiff University
Date of First Compliant Deposit: 16 April 2018
Date of Acceptance: 14 March 2018
Last Modified: 22 Oct 2023 10:31

Citation Data

Cited 5 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics