Muralidaran, Vignesh, Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885 and Knight, Dawn ORCID: https://orcid.org/0000-0002-4745-6502 2021. A systematic review of unsupervised approaches to grammar induction. Natural Language Engineering 27 (6) , pp. 647-689. 10.1017/S1351324920000327 |
Abstract
This study systematically reviews existing approaches to unsupervised grammar induction in terms of their theoretical underpinnings, practical implementations and evaluation. Our motivation is to identify the influence of functional-cognitive schools of grammar on language processing models in computational linguistics. This is an effort to fill any gap between the theoretical school and the computational processing models of grammar induction. Specifically, the review aims to answer the following research questions: Which types of grammar theories have been the subjects of grammar induction? Which methods have been employed to support grammar induction? Which features have been used by these methods for learning? How were these methods evaluated? Finally, in terms of performance, how do these methods compare to one another? Forty-three studies were identified for systematic review out of which 33 described original implementations of grammar induction; three provided surveys and seven focused on theories and experiments related to acquisition and processing of grammar in humans. The data extracted from the 33 implementations were stratified into 7 different aspects of analysis: theory of grammar; output representation; how grammatical productivity is processed; how grammatical productivity is represented; features used for learning; evaluation strategy and implementation methodology. In most of the implementations considered, grammar was treated as a generative-formal system, autonomous and independent of meaning. The parser decoding was done in a non-incremental, head-driven fashion by assuming that all words are available for the parsing model and the output representation of the grammar learnt was hierarchical, typically a dependency or a constituency tree. However, the theoretical and experimental studies considered suggest that a usage-based, incremental, sequential system of grammar is more appropriate than the formal, non-incremental, hierarchical view of grammar. This gap between the theoretical as well as experimental studies on one hand and the computational implementations on the other hand should be addressed to enable further progress in computational grammar induction research.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | English, Communication and Philosophy Computer Science & Informatics Data Innovation Research Institute (DIURI) |
Publisher: | Cambridge University Press (CUP) |
ISSN: | 1351-3249 |
Date of Acceptance: | 1 May 2020 |
Last Modified: | 11 Mar 2023 02:22 |
URI: | https://orca.cardiff.ac.uk/id/eprint/137035 |
Citation Data
Cited 1 time in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |