Leong, Hui Sun and Kipling, David Glyn 2009. Text-based over-representation analysis of microarray gene lists with annotation bias. Nucleic Acids Research 37 (11) , e79-e79. 10.1093/nar/gkp310 |
Abstract
A major challenge in microarray data analysis is the functional interpretation of gene lists. A common approach to address this is over-representation analysis (ORA), which uses the hypergeometric test (or its variants) to evaluate whether a particular functionally defined group of genes is represented more than expected by chance within a gene list. Existing applications of ORA have been largely limited to pre-defined terminologies such as GO and KEGG. We report our explorations of whether ORA can be applied to a wider mining of free-text. We found that a hitherto underappreciated feature of experimentally derived gene lists is that the constituents have substantially more annotation associated with them, as they have been researched upon for a longer period of time. This bias, a result of patterns of research activity within the biomedical community, is a major problem for classical hypergeometric test-based ORA approaches, which cannot account for such bias. We have therefore developed three approaches to overcome this bias, and demonstrate their usability in a wide range of published datasets covering different species. A comparison with existing tools that use GO terms suggests that mining PubMed abstracts can reveal additional biological insight that may not be possible by mining pre-defined ontologies alone.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Schools > Medicine |
Subjects: | Q Science > QH Natural history > QH426 Genetics R Medicine > R Medicine (General) |
Publisher: | Oxford University Press |
ISSN: | 0305-1048 |
Last Modified: | 04 Jun 2017 03:50 |
URI: | https://orca.cardiff.ac.uk/id/eprint/27745 |
Citation Data
Cited 28 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
![]() |
Edit Item |