Embury, Suzanne Marie, Missier, Paolo, Sampaio, Sandra, Greenwood, R. Mark and Preece, Alun David ORCID: https://orcid.org/0000-0003-0349-9057 2009. Incorporating domain-specific information quality constraints into database queries. Journal of Data and Information Quality 1 (2) , 11. 10.1145/1577840.1577846 |
Abstract
The range of information now available in queryable repositories opens up a host of possibilities for new and valuable forms of data analysis. Database query languages such as SQL and XQuery offer a concise and high-level means by which such analyses can be implemented, facilitating the extraction of relevant data subsets into either generic or bespoke data analysis environments. Unfortunately, the quality of data in these repositories is often highly variable. The data is still useful, but only if the consumer is aware of the data quality problems and can work around them. Standard query languages offer little support for this aspect of data management. In principle, however, it should be possible to embed constraints describing the consumer’s data quality requirements into the query directly, so that the query evaluator can take over responsibility for enforcing them during query processing. Most previous attempts to incorporate information quality constraints into database queries have been based around a small number of highly generic quality measures, which are defined and computed by the information provider. This is a useful approach in some application areas but, in practice, quality criteria are more commonly determined by the user of the information not by the provider. In this article, we explore an approach to incorporating quality constraints into database queries where the definition of quality is set by the user and not the provider of the information. Our approach is based around the concept of a quality view, a configurable quality assessment component into which domain-specific notions of quality can be embedded. We examine how quality views can be incorporated into XQuery, and draw from this the language features that are required in general to embed quality views into any query language. We also propose some syntactic sugar on top of XQuery to simplify the process of querying with quality constraints.
Item Type: | Article |
---|---|
Status: | Published |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources |
Publisher: | Association for Computing Machinery |
ISSN: | 1936-1955 |
Last Modified: | 17 Oct 2022 10:08 |
URI: | https://orca.cardiff.ac.uk/id/eprint/6888 |
Citation Data
Cited 11 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |