Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885, Owen, David ORCID: https://orcid.org/0000-0002-4028-0591, Knight, Dawn ORCID: https://orcid.org/0000-0002-4745-6502 and Artemiou, Andreas ORCID: https://orcid.org/0000-0002-7501-4090 2019. Unsupervised multi-word term recognition in Welsh. Presented at: Celtic Language Technology Workshop 2019, Dublin, Ireland, 19 August 2019. Published in: Lynn, Teresa, Prys, Delyth, Batchelor, Colin and Tyers, Francis eds. Proceedings of the Celtic Language Technology Workshop. European Association for Machine Translation, |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (261kB) | Preview |
Abstract
This paper investigates an adaptation of an existing system for multi-word term recognition, originally developed for English, for Welsh. We overview the modifications required with a special focus on an important difference between the two representatives of two language families, Germanic and Celtic, which is concerned with the directionality of noun phrases. We successfully modelled these differences by means of lexico–syntactic patterns, which represent parameters of the system and, therefore, required no re–implementation of the core algorithm. The performance of the Welsh version was compared against that of the English version. For this purpose, we assembled three parallel domain–specific corpora. The results were compared in terms of precision and recall. Comparable performance was achieved across the three domains in terms of the two measures (P = 68.9%, R = 55.7%), but also in the ranking of automatically extracted terms measured by weighted kappa coefficient (k = 0.7758). These early results indicate that our approach to term recognition can provide a basis for machine translation of multi-word terms.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Published Online |
Status: | Published |
Schools: | Mathematics English, Communication and Philosophy Computer Science & Informatics Data Innovation Research Institute (DIURI) |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Publisher: | European Association for Machine Translation |
Related URLs: | |
Date of First Compliant Deposit: | 8 October 2019 |
Last Modified: | 25 Nov 2022 11:23 |
URI: | https://orca.cardiff.ac.uk/id/eprint/125820 |
Actions (repository staff only)
Edit Item |