Folkman, L., Yang, Y., Li, Z., Stantic, B., Sattar, A., Mort, Matthew, Cooper, David Neil ORCID: https://orcid.org/0000-0002-8943-8484, Liu, Y. and Zhou, Y. 2015. DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics 31 (10) , pp. 1599-1606. 10.1093/bioinformatics/btu862 |
Abstract
Motivation: Frameshifting (FS) indels and nonsense (NS) variants disrupt the protein-coding sequence downstream of the mutation site by changing the reading frame or introducing a premature termination codon, respectively. Despite such drastic changes to the protein sequence, FS indels and NS variants have been discovered in healthy individuals. How to discriminate disease-causing from neutral FS indels and NS variants is an understudied problem. Results: We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively employed for independent testing of our method trained on human-derived FS indels. DDIG-in (FS) achieves a Matthews correlation coefficient (MCC) of 0.59, a sensitivity of 86%, and a specificity of 72% for FS indels. Application of DDIG-in (FS) to NS variants yields essentially the same performance (MCC of 0.43) as a method that was specifically trained for NS variants. DDIG-in (FS) was shown to make a significant improvement over existing techniques. Availability and implementation: The DDIG-in web-server for predicting NS variants, FS indels, and non-frameshifting (NFS) indels is available at http://sparks-lab.org/ddig.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Medicine |
Subjects: | Q Science > QH Natural history > QH426 Genetics R Medicine > R Medicine (General) |
Publisher: | Oxford University Press |
ISSN: | 1367-4803 |
Date of Acceptance: | 23 December 2014 |
Last Modified: | 17 Jun 2023 19:36 |
URI: | https://orca.cardiff.ac.uk/id/eprint/84065 |
Citation Data
Cited 47 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |