Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

HBLAST: parallelised sequence similarity - A Hadoop MapReducable basic local alignment search tool

O'Driscoll, Aisling, Belogrudov, Vladislav, Carroll, John, Kropp, Kai, Walsh, Paul, Ghazal, Peter and Sleator, Roy D. 2015. HBLAST: parallelised sequence similarity - A Hadoop MapReducable basic local alignment search tool. Journal of Biomedical Informatics 54 , pp. 58-64. 10.1016/j.jbi.2015.01.008

Full text not available from this repository.

Abstract

The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing “Big Data” – the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of “divide and conquer” for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using “virtual partitioning”. HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Medicine
Uncontrolled Keywords: Cloud computing, Sequence alignment, Bioinformatics, Big data, Genomics, Hadoop
Publisher: Elsevier
ISSN: 1532-0464
Date of Acceptance: 19 January 2015
Last Modified: 21 Mar 2018 08:21
URI: https://orca.cardiff.ac.uk/id/eprint/110064

Citation Data

Cited 34 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item