Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Phraseological fingerprints: using habitual wordings to aid authorship attribution

Buerki, Andreas ORCID: 2015. Phraseological fingerprints: using habitual wordings to aid authorship attribution. Presented at: Corpus Linguistics in the South (CLS) 10: Corpus approaches to public and professional discourse, Cardiff University, Cardiff, UK, 28 November 2015.

Full text not available from this repository.


Research into formulaic language suggests that habitual ways of putting things mark speakers and writers out as belonging to certain speech (sub-)communities, such as academic vs. non-academic communities, L1 vs. L2 speaker communities or speech communities at different points in time. It has further been shown that individuals' phraseological habits can be distinctive enough to mark them out individually, such as when the expression I entirely understand is or was characteristic of Tony Blair (Mollin 2009). Interest has recently grown in bringing this observation to bear on the task of authorship attribution for disputed texts in forensic settings (Johnson and Wright 2014; Larner 2013), but the typical limits on the number available texts and their often short lengths has proven a significant hurdle. This study presents and evaluates a number of approaches to exploiting phraseological choices to aid authorship attribution, from identifying distinctive phraseological sequences through linguistically informed close reading to automatic, n-gram-based techniques derived from work on information retrieval. Based on a corpus of multiple short texts (less than 280 words in length) by each of a small sample of individuals, it is shown how phraseological indicators on their own, as well as in conjunction with other authorship markers, can be used to successfully identify authors even on the basis of a limited number of short texts. References Johnson, A., & Wright, D. (2014). Identifying idiolect in forensic authorship attribution: An n-gram textbite approach. Language and Law, 1(1), 37-69. Larner, S. (2014). A preliminary investigation into the use of fixed formulaic sequences as a marker of authorship. IJSLL, 21(1). Mollin, S. (2009). "I entirely understand" is a Blairism: The methodology of identifying idiolectal collocations. International Journal of Corpus Linguistics, 14(3), 367-392.

Item Type: Conference or Workshop Item (Paper)
Date Type: Completion
Status: Unpublished
Schools: English, Communication and Philosophy
Subjects: P Language and Literature > P Philology. Linguistics
Last Modified: 31 Oct 2022 09:49

Actions (repository staff only)

Edit Item Edit Item