Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

OpenFlyData: An exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster

Miles, Alistair, Zhao, Jun, Klyne, Graham, White-Cooper, Helen ORCID: and Shotton, David 2010. OpenFlyData: An exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster. Journal of Biomedical Informatics 43 (5) , pp. 752-761. 10.1016/j.jbi.2010.04.004

Full text not available from this repository.


Motivation: Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. Results: We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData’s services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. Availability: The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at SPARQLite, an implementation of the SPARQL protocol, is available at All software is provided under the GPL version 3 open source license.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Biosciences
Subjects: Q Science > QH Natural history > QH426 Genetics
Q Science > QR Microbiology
Uncontrolled Keywords: chado; data integration; data web; Drosophila; gene expression; performance; RDF; SPARQL; triple store; user interface
Publisher: Elsevier
ISSN: 1532-0464
Last Modified: 19 Oct 2022 08:58

Citation Data

Cited 14 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item