Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data

Zhang, Peng, Boisson, Bertrand, Stenson, Peter D., Cooper, David N. ORCID:, Casanova, Jean-Laurent, Abel, Laurent and Itan, Yuval 2019. SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data. Nucleic Acids Research 47 (W1) , W623-W631. 10.1093/nar/gkz326

[thumbnail of gkz326.pdf] PDF - Published Version
Available under License Creative Commons Attribution Non-commercial.

Download (5MB)


Human whole-genome-sequencing reveals about 4 000 000 genomic variants per individual. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence alignment, phylogenetic analysis, and structure prediction. However, there is no existing webserver capable of extracting DNA/protein sequences for genomic variants from VCF files in a user-friendly and efficient manner. We developed the SeqTailor webserver to bridge this gap, by enabling rapid extraction of (i) DNA sequences around genomic variants, with customizable window sizes and options to annotate the splice sites closest to the variants and to consider the neighboring variants within the window; and (ii) protein sequences encoded by the DNA sequences around genomic variants, with built-in SnpEff annotator and customizable window sizes. SeqTailor supports 11 species, including: human (GRCh37/GRCh38), chimpanzee, mouse, rat, cow, chicken, lizard, zebrafish, fruitfly, Arabidopsis and rice. Standalone programs are provided for command-line-based needs. SeqTailor streamlines the sequence extraction process, and accelerates the analysis of genomic variants with software requiring DNA/protein sequences. It will facilitate the study of genomic variation, by increasing the feasibility of sequence-based analysis and prediction. The SeqTailor webserver is freely available at

Item Type: Article
Date Type: Publication
Status: Published
Schools: Medicine
Publisher: Oxford University Press (OUP)
ISSN: 0305-1048
Date of First Compliant Deposit: 30 May 2019
Date of Acceptance: 23 April 2019
Last Modified: 04 Nov 2022 12:25

Citation Data

Cited 6 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics