Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Multi-word sequences in motion

Buerki, Andreas ORCID: 2009. Multi-word sequences in motion. Presented at: Corpus Linguistics 2009, Liverpool, 22 July 2009.

Full text not available from this repository.


A growing number of researchers see word sequences following Sinclair’s idiom principle (1991:110) as far more central to language than was previously thought. While a considerable amount of work has now been done on describing, classifying and quantifying such multi-word sequences, the bewildering diversity of terminology in the field is one indicator that we are still in the early days of multi-word sequence research. The project introduced in the present paper analyses German data spanning the 20th century in search of changes in recurring multi-word sequences over the period. The results promise not only to shed light on the proportion of words that are part of multi-word sequences (MWS) in written German, which has not so far been investigated broadly, but particularly on the rate and type of change they are undergoing. If language change and change in sociocultural settings can at all be related, multi-word sequences, by virtue of their centrality to human language, will be able to provide meaningful and novel insights into sociocultural change in a speech community. The paper will present preliminary results of a first phase of the project, based on the Swiss Text Corpus, a 20-million word corpus of the Swiss variety of written Standard German recently completed at the University of Basel. For the investigation, the corpus was divided into four sub-corpora representing different periods of the 20th century. MWS were extracted using the Ngram Statistics Package (Banerjee and Pedersen 2003) and subsequently analysed and compared. The work is among the very first investigations using the Swiss Text Corpus which itself represents the only Swiss Standard German corpus not principally composed of newspaper texts, but balanced across four genre types. References: Pedersen, T., & Banerjee, S. (2003). The design, implementation and use of the ngram statistics package. In Proceedings of the 4th international conference on intelligent text processing and computational linguistics, Mexico City. Swiss Text Corpus (Schweizer Text Korpus): Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Item Type: Conference or Workshop Item (Paper)
Date Type: Completion
Status: Unpublished
Schools: English, Communication and Philosophy
Subjects: P Language and Literature > P Philology. Linguistics
Last Modified: 28 Oct 2022 10:21

Actions (repository staff only)

Edit Item Edit Item