Methods and applications of computational identification of formulaicity – an overview

Buerki, Andreas

2017. Methods and applications of computational identification of formulaicity – an overview. Presented at: Research Symposium on Methods and Applications of Computational (and other) Identification of Formulaicity, Cardiff University, Cardiff, UK, 27 February 2017.

Full text not available from this repository.

Abstract

In this presentation, I start by suggesting an understanding of formulaicity in language as comprised of 'linguistic patterns that manifest common ways of phrasing things in a speech community' and then discuss various different computational approaches to their identification in corpus materials. They include 'broad' formulaicity extractions (such as average bigram and trigram attraction scores of complete texts, i.e. Zimmerer et al. 2016, or the identification of all recurrent word sequences as formulaic, e.g. Altenberg & Eeg-Olofsson, 1990), a 'mechanical' extraction (e.g. Biber's lexical bundles that are defined only via frequency and length), extractions making use of syntactic information (as in Seretan 2011), extractions using various statistical measures of mutual attraction of words as well as mixed approaches (e.g. Buerki 2012). In light of this, I argue that the main benefits of computational identification procedures is their ability to process large amounts of data and their consistency in applying chosen criteria, neither of which applies to identification methods that rely on the case-by-case decisions of researchers. Drawbacks include the observation that in many cases, the output of computational procedures are found lacking in various regards when compared to judgements by linguists (as diverse as the latter might be). Finally, a number of applications of methods of computational identification of formulaicity from my own research are showcased. These include applications in the fields of historical linguistics, language and culture, authorship attribution and research into linguistic markers of the risk of Alzheimer's disease. These various applications show both the potential of methods of computational identification of formulaicity as well as their indispensibility in key areas of linguistic research.

Item Type:	Conference or Workshop Item - published (Paper)
Date Type:	Completion
Status:	Unpublished
Schools:	Schools > English, Communication and Philosophy
Subjects:	P Language and Literature > P Philology. Linguistics
Last Modified:	21 Oct 2022 06:54
URI:	https://orca.cardiff.ac.uk/id/eprint/98716

Actions (repository staff only)

Edit Item

CORE (COnnecting REpositories)