Neale, Steven, Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885, Needs, Jennifer, Watkins, Gareth, Morris, Steve, Fitzpatrick, Teresa, Marshall, L and Knight, Dawn ORCID: https://orcid.org/0000-0002-4745-6502 2017. The CorCenCC crowdsourcing app: a bespoke tool for the user-driven creation of the national corpus of contemporary Welsh. Presented at: The 9th International Corpus Linguistics Conference, Birmingham, UK, 24-28 July 2017. |
Abstract
The CorCenCC project1 (Corpws Cenedlaethol Cymraeg Cyfoes or National Corpus of Contemprary Welsh in English; www.corcencc.org) aims to assemble a 10 million-word corpus of the Welsh language across a range of contemporary contexts from spoken, written and e-language sources. In keeping with its contemporary aspect, a key innovation of the project is to facilitate crowdsourced contributions to the corpus, giving Welsh speakers the opportunity to directly involve themselves in the creation of the corpus. This is of vital importance in the Welsh context, in which community pride is strong and for which an open linguistic resource that properly represents the constantly-evolving landscape of contemporary Welsh speakers and the way their language is used is expected to have a wide-reaching impact on the way publishers, policy-makers, the education sector, academic researchers and many more work with Welsh going forward. This presentation introduces the CorCenCC Crowdsourcing App, a mobile application designed to facilitate direct contribution of spoken language data to the corpus. Spoken language data will comprise 400,000 of the 10 million word corpus (alongside 400,000 word of written data and 200,000 words of electronic language such as blogs and emails), and app users can contribute directly to this number by recording their Welsh-language narratives (Figures 1 and 2), attaching and editing appropriate metadata to describe the recorded conversations, and uploading them for inclusion in the final corpus. The metadata attached to the recorded conversations includes details about where the recording was made, who else was involved in the recording, and tags that future corpus tools will be able to use to search the data in the final corpus
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | English, Communication and Philosophy Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Date of Acceptance: | 1 June 2017 |
Last Modified: | 28 Aug 2024 15:00 |
URI: | https://orca.cardiff.ac.uk/id/eprint/99261 |
Actions (repository staff only)
Edit Item |