Collocation
From Wikipedia, the free encyclopedia
Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance.
Collocation comprises the restrictions on how words can be used together, for example which prepositions are used with particular verbs, or which verbs and nouns are used together. Collocations are examples of lexical units. Collocations should not be confused with idioms.
Collocation extraction is a task that extracts collocations automatically from a corpus-using computer in computational linguistics.
Contents |
[edit] Common features
- Non-substitutability
- We cannot substitute a word in a collocation with a related word. For example, we cannot say yellow wine instead of white wine although both yellow and white are the names of colours.
- Non-modifiability
- We cannot modify a collocation or apply syntactic transformations.
[edit] Expanded definition
If the expression is heard often, transmitting itself memetically, the words become 'glued' together in our minds. 'Crystal clear', 'middle management', 'nuclear family', and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'.
Collocations can be in a syntactic relation (such as verb-object: 'make' and 'decision'), lexical relation (such as antonymy), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a grammatically correct sentence will stand out as 'awkward' if collocational preferences are violated. This makes collocation an interesting area for language teaching.
Corpus Linguists specify a Key Word in Context (KWIC) and identify the words immediately surrounding them. This gives an idea of the way words are used.
The processing of collocations involves a number of parameters, the most important of which is the measure of association, which evaluates whether the co-occurrence is purely by chance or statistically significant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association include mutual information, t scores, and log-likelihood.
Rather than select a single definition, Gledhill[1] proposes that collocation involves at least three different perspectives: (i) cooccurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates[2][3][4], (ii) construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern[5], or as a relation between a base and its collocative partners[6] and (iii) expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form[7][8]. It should be pointed out here that these different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum:
- ‘Free Combination’ ↔ ‘Bound Collocation’ ↔ ‘Frozen Idiom’
[edit] See also
- Cliché
- Collocational restriction
- Collostructional analysis
- Compound noun, adjective and verb
- Phrasal verb
- Phraseology
- Siamese twins (English language)
[edit] References
- ^ Gledhill C. (2000): Collocations in Science Writing, Narr, Tübingen
- ^ Firth J.R. (1957): Papers in Linguistics 1934–1951. Oxford: Oxford University Press.
- ^ Sinclair J. (1996): “The Search for Units of Meaning”, in Textus, IX, 75–106.
- ^ Smadja F. A & McKeown, K. R. (1990): “Automatically extracting and representing collocations for language generation”, Proceedings of ACL’90, 252–259, Pittsburgh, Pennsylvania.
- ^ Hunston S. & Francis G. (2000): Pattern Grammar — A Corpus-Driven Approach to the Lexical Grammar of English, Amsterdam, John Benjamins
- ^ Hausmann F. J. (1989): Le dictionnaire de collocations. In Hausmann F.J., Reichmann O., Wiegand H.E., Zgusta L.(eds), Wörterbücher : ein internationales Handbuch zur Lexicographie. Dictionaries. Dictionnaires. Berlin/New-York : De Gruyter. 1010-1019.
- ^ Moon R. (1998): Fixed Expressions and Idioms, a Corpus-Based Approach. Oxford, Oxford University Press.
- ^ Frath P. & Gledhill C. (2005): “Free-Range Clusters or Frozen Chunks? Reference as a Defining Criterion for Linguistic Units,” in Recherches anglaises et Nord-américaines, vol. 38 :25–43