Buerki, Andreas ORCID: https://orcid.org/0000-0003-2151-3246 2009. Multi-word sequences in motion. Presented at: Corpus Linguistics 2009, Liverpool, 22 July 2009. |
Abstract
A growing number of researchers see word sequences following Sinclair’s idiom principle (1991:110) as far more central to language than was previously thought. While a considerable amount of work has now been done on describing, classifying and quantifying such multi-word sequences, the bewildering diversity of terminology in the field is one indicator that we are still in the early days of multi-word sequence research. The project introduced in the present paper analyses German data spanning the 20th century in search of changes in recurring multi-word sequences over the period. The results promise not only to shed light on the proportion of words that are part of multi-word sequences (MWS) in written German, which has not so far been investigated broadly, but particularly on the rate and type of change they are undergoing. If language change and change in sociocultural settings can at all be related, multi-word sequences, by virtue of their centrality to human language, will be able to provide meaningful and novel insights into sociocultural change in a speech community. The paper will present preliminary results of a first phase of the project, based on the Swiss Text Corpus, a 20-million word corpus of the Swiss variety of written Standard German recently completed at the University of Basel. For the investigation, the corpus was divided into four sub-corpora representing different periods of the 20th century. MWS were extracted using the Ngram Statistics Package (Banerjee and Pedersen 2003) and subsequently analysed and compared. The work is among the very first investigations using the Swiss Text Corpus which itself represents the only Swiss Standard German corpus not principally composed of newspaper texts, but balanced across four genre types. References: Pedersen, T., & Banerjee, S. (2003). The design, implementation and use of the ngram statistics package. In Proceedings of the 4th international conference on intelligent text processing and computational linguistics, Mexico City. Swiss Text Corpus (Schweizer Text Korpus): www.dwds.ch Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Completion |
Status: | Unpublished |
Schools: | English, Communication and Philosophy |
Subjects: | P Language and Literature > P Philology. Linguistics |
Last Modified: | 28 Oct 2022 10:21 |
URI: | https://orca.cardiff.ac.uk/id/eprint/77970 |
Actions (repository staff only)
Edit Item |