Birgmeier, Johannes, Tierno, Andrew, Stenson, Peter, Deisseroth, Cole, Jagadeesh, Karthik, Cooper, David ORCID: https://orcid.org/0000-0002-8943-8484, Bernstein, Jonathan, Haeussler, Maximilian and Bejerano, Gill 2018. AVADA improves automated genetic variant database construction directly from full-text literature. [Online]. BioRxiv. Available at: http://dx.doi.org/10.1101/461269 |
Preview |
PDF
- Submitted Pre-Print Version
Download (1MB) | Preview |
Abstract
Purpose: The primary literature on human genetic diseases includes descriptions of pathogenic variants that are essential for clinical diagnosis. Variant databases such as ClinVar and HGMD collect pathogenic variants by manual curation. We aimed to automatically construct a freely accessible database of pathogenic variants directly from full-text articles about genetic disease. Methods: AVADA (Automatically curated VAriant DAtabase) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic variants and genes in full text of primary literature and converts them to genomic coordinates for rapid downstream use. Results: AVADA automatically curated almost 60% of pathogenic variants deposited in HGMD, a 4.4-fold improvement over the current state of the art in automated variant extraction. AVADA also contains more than 60,000 pathogenic variants that are in HGMD, but not in ClinVar. In a cohort of 245 diagnosed patients, AVADA correctly annotated 38 previously described diagnostic variants, compared to 43 using HGMD, 20 using ClinVar and only 13 (wholly subsumed by AVADA and ClinVar's) using the best automated abstracts-only based approach. Conclusion: AVADA is the first machine learning tool that automatically curates a variants database directly from full text literature. AVADA is available upon publication at http://bejerano.stanford.edu/AVADA.
Item Type: | Website Content |
---|---|
Date Type: | Published Online |
Status: | Unpublished |
Schools: | Medicine |
Publisher: | BioRxiv |
Date of First Compliant Deposit: | 12 March 2019 |
Date of Acceptance: | 4 November 2018 |
Last Modified: | 25 Oct 2022 13:42 |
URI: | https://orca.cardiff.ac.uk/id/eprint/120557 |
Actions (repository staff only)
Edit Item |