Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A collective, probabilistic approach to schema mapping using diverse noisy evidence

Kimmig, Angelika ORCID: https://orcid.org/0000-0002-6742-4057, Memory, Alex, Miller, Renee J. and Getoor, Lise 2019. A collective, probabilistic approach to schema mapping using diverse noisy evidence. IEEE Transactions on Knowledge and Data Engineering 31 (8) , pp. 1426-1439. 10.1109/TKDE.2018.2865785

[thumbnail of kimmig-tkde18.pdf]
Preview
PDF - Accepted Post-Print Version
Download (526kB) | Preview

Abstract

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of schema mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings as well as inconsistencies and incompleteness in the input. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using state-of-the-art probabilistic reasoning techniques. Our evaluation on a wide range of integration scenarios, including several real-world domains, demonstrates that CMD effectively combines data and metadata information to infer highly accurate mappings even with significant levels of noise.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
ISSN: 1041-4347
Date of First Compliant Deposit: 20 August 2018
Date of Acceptance: 31 July 2018
Last Modified: 06 Nov 2023 18:33
URI: https://orca.cardiff.ac.uk/id/eprint/114258

Citation Data

Cited 5 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics