# Level of measurement

The "levels of measurement" is an expression which typically refers to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 article titled "On the theory of scales of measurement"[1]. In this article Stevens claimed that all measurement in science was conducted using four different types of numerical scales which he called "nominal", "ordinal", "interval" and "ratio".

## The theory of scale types

Stevens (1946, 1951) proposed that measurements can be classified into four different types of scales. These were:

• nominal
• ordinal
• interval
• ratio
Scale Type Permissible Statistics Admissible Scale Transformation Mathematical structure
nominal (also denoted as categorical or discrete) mode, chi square One to One (equality (=)) standard set structure (unordered)
ordinal median, percentile Monotonic increasing (order (<)) totally ordered set
interval mean, standard deviation, correlation, regression, analysis of variance Positive linear (affine) affine line
ratio All statistics permitted for interval scales plus the following: geometric mean, harmonic mean, coefficient of variation, logarithms Positive similarities (multiplication) field

### Nominal scale

Nominal scales are mere codes assigned to objects as labels, they are not measurements. For example, rocks can be generally categorized as (1) igneous, (2) sedimentary and (3) metamorphic. A code of "3" given to any particular stone observed does not mean that stone possesses more "rockness" than a stone coded as "1", anymore than a person with red hair does not possess more "hairness" than a person with blonde hair.

Stevens (1946, p.679) must have known that claiming nominal scales to measure obviously non-quantitative things would have attracted criticism, so he invoked his theory of measurement to justify nominal scales as measurement:

 “ ...the use of numerals as names for classes is an example of the assignment of numerals according to rule. The rule is: Do not assign the same numeral to different classes or different numerals to the same class. Beyond that, anything goes with the nominal scale. ”

The only kind of measure of central tendency that remains invariant under one-one transformations is the mode. The median and mean cannot be defined.

### Ordinal scale

In this scale type, the numbers assigned to objects or events represent the rank order (1st, 2nd, 3rd etc.) of the entities assessed. An example of ordinal measurement is the results of a horse race, which say only which horses arrived first, second, third, etc. but include no information about times. Another is the Mohs scale of mineral hardness, which characterizes the hardness of various minerals through the ability of a harder material to scratch a softer one, saying nothing about the actual hardness of any of them. Interestingly, Stevens' writings betrayed a critical view of psychometrics as he argued:

 “ As a matter of fact, most of the scales used widely and effectively by psychologists are ordinal scales. In the strictest propriety the ordinary statistics involving means and standard deviations ought not to be used with these scales, for these statistics imply a knowledge of something more than the relative rank order of data (1946, p.679). ”

Psychometricians like to theorise that psychometric tests produce interval scale measures of cognitive abilities (e.g. Lord & Novick, 1968; von Eye, 2005) but there is little prima facie evidence to suggest that such attributes are anything more than ordinal (Cliff & Keats, 2000; Michell, 2008).

The central tendency of an ordinal attribute can be represented by its mode or its median, but the mean cannot be defined.

### Interval scale

Quantitive attributes are all able to be measured on interval scales, as any difference between the levels of an attribute can be multiplied by any real number to exceed or equal another difference. A highly familiar example of interval scale measurement is temperature with the Celsius scale. In this particular scale, the unit of measurement is 1/100 of the difference between the melting temperature and the boiling temperature of water in atmospheric pressure. The "zero point" on an interval scale is arbitrary; and negative values can be used. The formal mathematical term is an affine space (in this case an affine line). Variables measured at the interval level are called "interval variables" or sometimes "scaled variables" as they have units of measurement.

Ratios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly. But ratios of differences can be expressed; for example, one difference can be twice another.

The central tendency of a variable measured at the interval level can be represented by its mode, its median, or its arithmetic mean. Statistical dispersion can be measured in most of the usual ways, which just involved differences or averaging, such as range, interquartile range, and standard deviation. Since one cannot divide, one cannot define measures that require a ratio, such as studentized range or coefficient of variation. More subtly, while one can define moments about the origin, only central moments are useful, since the choice of origin is arbitrary and not meaningful. One can define standardized moments, since ratios of differences are meaningful, but one cannot define coefficient of variation, since the mean is a moment about the origin, unlike the standard deviation, which is (the square root of) a central moment.

### Ratio measurement

Most measurement in the physical sciences and engineering is done on ratio scales. Mass, length, time, plane angle, energy and electric charge are examples of physical measures that are ratio scales. The scale type takes its name from the fact that measurement is the estimation of the ratio between a magnitude of a continuous quantity and a unit magnitude of the same kind (Michell, 1997, 1999). Informally, the distinguishing feature of a ratio scale is the possession of a non-arbitrary zero value. For example, the Kelvin temperature scale has a non-arbitrary zero point of absolute zero, which is denoted 0K and is equal to -273.15 degrees Celsius. This zero point is non arbitrary as the particles which comprise matter at this temperature have zero kinetic energy.

Examples of ratio scale measurement in the behavioural sciences are all but non-existent. Luce (2000) argues that an example of ratio scale measurement in psychology can be found in rank and sign dependent expected utility theory.

All statistical measures can be used for a variable measured at the ratio level, as all necessary mathematical operations are defined. The central tendency of a variable measured at the ratio level can be represented by, in addition to its mode, its median, or its arithmetic mean, also its geometric mean or harmonic mean. In addition to the measures of statistical dispersion defined for interval variables, such as range and standard deviation, for ratio variables one can also define measures that require a ratio, such as studentized range or coefficient of variation.

## Debate on classification scheme

There has been, and continues to be, debate about the merits of the classifications, particularly in the cases of the nominal and ordinal classifications (Michell, 1986). Thus, while Stevens' classification is widely adopted, it is by no means universally accepted (for example, Velleman & Wilkinson, 1993). [1]

Duncan (1986) observed that Stevens' classification nominal measurement is contrary to his own definition of measurement. Stevens (1975) said on his own definition of measurement that "the assignment can be any consistent rule. The only rule not allowed would be random assignment, for randomness amounts in effect to a nonrule". However, so-called nominal measurement involves arbitrary assignment, and the "permissible transformation" is any number for any other. This is one of the points made in Lord's (1953) satirical paper On the Statistical Treatment of Football Numbers.

Among those who accept the classification scheme, there is also some controversy in behavioural sciences over whether the mean is meaningful for ordinal measurement. In terms of measurement theory, it is not, because the arithmetic operations are not made on numbers that are measurements in units, and so the results of computations do not give numbers in units. However, many behavioural scientists use means for ordinal data anyway. This is often justified on the basis that ordinal scales in behavioural science are really somewhere between true ordinal and interval scales; although the interval difference between two ordinal ranks is not constant, it is often of the same order of magnitude. For example, applications of measurement models in educational contexts often indicate that total scores have a fairly linear relationship with measurements across a range of an assessment. Thus, some argue, that so long as the unknown interval difference between ordinal scale ranks is not too variable, interval scale statistics such as means can meaningfully be used on ordinal scale variables. Statistical analysis software such as PSPP require the user to select the appropriate measurement class for each variable. This ensures that subsequent user errors cannot inadvertently perform meaningless analyses (for example correlation analysis with a variable on a nominal level).

L. L. Thurstone made progress toward developing a justification for obtaining interval-level measurements based on the law of comparative judgment. Further progress was made by Georg Rasch(1960), who developed the probabilistic Rasch model which provides a theoretical basis and justification for obtaining interval-level measurements from counts of observations such as total scores on assessments.

Another issue is derived from Nicholas R. Chrisman's article "Rethinking Levels of Measurement for Cartography"[2], in which he introduces an expanded list of levels of measurement to account for various measurements that do not necessarily fit with the traditional notion of levels of measurement. Measurements bound to a range and repeat (like degrees in a circle, time, etc), graded membership categories, and other types of measurement do not fit to Steven's original work, leading to the introduction of 6 new levels of measurement leading to: (1) Nominal, (2) Graded membership, (3) Ordinal, (4) Interval, (5) Log-Interval, (6) Extensive Ratio, (7) Cyclical Ratio, (8) Derived Ratio, (9) Counts and finally (10) Absolute. The extended levels of measurement are rarely used outside of academic geography.

## Scale types and Stevens' "operational theory of measurement"

The theory of scale types is the intellectual handmaiden to Stevens' "operational theory of measurement," which was to become definitive within psychology and the behavioral sciences, despite it being quite at odds with the understanding of measurement held in the natural sciences (Michell, 1999). Essentially, the operational theory of measurement was a reaction to the conclusions of a committee established in 1932 by the British Association for the Advancement of Science to investigate the possibility of genuine scientific measurement in the psychological and behavioral sciences. This committee, which became known as the Ferguson committee, published a Final Report (Ferguson, et al., 1940, p.245) in which Stevens' sone scale (Stevens & Davis, 1938) was an object of criticism:

 “ ...any law purporting to express a quantitative relation between sensation intensity and stimulus intensity is not merely false but is in fact meaningless unless and until a meaning can be given to the concept of addition as applied to sensation. ”

That is, if Stevens' sone scale was genuinely measuring the intensity of auditory sensations, then evidence for such sensations being quantitative attributes must be produced. The evidence needed was the presence of additive structure - a concept comprehensively treated by the German mathematician Otto Hölder (Hölder, 1901). Given the physicist and measurement theorist Norman Robert Campbell dominated the Ferguson committee's deliberations, the committee concluded that real measurement in the social sciences was impossible due to the lack of concatenation operations. This conclusion was later rendered false by the discovery of the theory of conjoint measurement by Debreu (1960) and independently by Luce & Tukey (1964). However, Stevens' reaction was not to conduct experiments to test for the presence of additive structure in sensations, but instead to render the conclusions of the Ferguson committee null and void by proposing a new theory of measurement:

 “ Paraphrasing N.R. Campbell (Final Report, p.340), we may say that measurement is, in the broadest sense, defined as the assignment of numerals to objects and events according to rule (Stevens, 1946, p. 677). ”

Stevens was greatly influenced by the ideas of another Harvard academic, the Nobel laureate physicist Percy Bridgman (1927), whose doctrine of operationism Stevens used to define measurement. In Stevens' definition for example, it is the use of a tape measure that defines length (the object of measurement) as being measurable (and so by implication quantitative). However, this is the critical logical flaw within operationism, which is the mistake of confusing what is known (length) with how it is known (the use of a tape measure). In more formal terms, operationism confuses the relations between two objects or events for properties of one of those of objects or events (Hardcastle, 1995; Michell, 1999; Moyer, 1981a,b; Rogers, 1989). Despite this fatal logical flaw and its descrediting of Stevens' theory of measurement, the latter has remained entrenched in the behavioural sciences to an such extent that Stevens (1975) began to complain that his idea was not being sufficiently accredited to him.

The Canadian measurement theorist William Rozeboom (1966) was an early and trenchant critic of Stevens' theory of scale types. But it was not until much later with the work of mathematical psychologists Theodore Alper (1985, 1987), Louis Narens (1981a, b) and R. Duncan Luce (1986, 1987, 2001) did the concept of scale types receive the mathematical rigour which it lacked at its inception. As Luce (1997, p.395) bluntly stated:

 “ S.S. Stevens (1946, 1951, 1975) claimed that what counted was having an interval or ratio scale. Subsequent research has given meaning to this assertion, but given his attempts to invoke scale type ideas it is doubtful if he understood it himself...no measurement theorist I know accepts Stevens' broad definition of measurement...in our view, the only sensible meaning for 'rule' is empirically testable laws about the attribute. ”

## References

1. ^ Stevens, S. S. (1946). "On the Theory of Scales of Measurement". Science 103 (2684): 677–680. PMID 17750512.
2. ^ Chrisman, Nicholas R. (1998). Rethinking Levels of Measurement for Cartography. Cartography and Geographic Information Science, vol. 25 (4), pp. 231-242
• Alper, T. M. (1985). A note on real measurement structures of scale type (m, m + 1). Journal of Mathematical Psychology, 29, 73-81.
• Alper, T.M. (1987). A classification of all order- preserving homeomorphism groups of the reals that satisfy finite uniqueness. Journal of Mathematical Psychology, 31, 135-154.
• Briand, L. & El Emam, K. & Morasca, S. (1995). On the Application of Measurement Theory in Software Engineering. Empirical Software Engineering, 1, 61-88. [On line] http://www2.umassd.edu/swpi/ISERN/isern-95-04.pdf
• Babbie, E. (2004). The Practice of Social Research, 10th edition, Wadsworth, Thomson Learning Inc., ISBN 0-534-62029-9
• Lord, F.M. (1953). On the Statistical Treatment of Football Numbers. Reprint in Readings in Statistics, Ch. 3, (Haber, A., Runyon, R.P., and Badia, P.) Reading, Mass: Addison-Wesley, 1970.
• Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
• Luce, R.D. (1986). Uniqueness and homogeneity of ordered relational structures. Journal of Mathematical Psychology, 30, 391-415.
• Luce, R.D. (1987). Measurement structures with Archimedean ordered translation groups. Order, 4, 165-189.
• Luce, R.D. (1997). Quantification and symmetry: commentary on Michell 'Quantitative science and the definition of measurement in psychology'. British Journal of Psychology, 88, 395-398.
• Luce, R.D. (2000). Utility of uncertain gains and losses: measurement theoretic and experimental approaches. Mahwah, N.J.: Lawrence Erlbaum.
• Luce, R.D. (2001). Conditions equivalent to unit representations of ordered relational structures. Journal of Mathematical Psychology, 45, 81-98.
• Luce, R.D. & Tukey, J.W. (1964). Simultaneous conjoint measurement: a new scale type of fundamental measurement. Journal of Mathematical Psychology, 1, 1-27.
• Michell, J. (1986). Measurement scales and statistics: a clash of paradigms. Psychological Bulletin, 3, 398-407.
• Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355-383.
• Michell, J. (1999). Measurement in Psychology - A critical history of a methodological concept. Cambridge: Cambridge University Press.
• Michell, J. (2008). Is psychometrics pathological science? Measurement - Interdisciplinary Research & Perspectives, 6, 7-24.
• Narens, L. (1981a). A general theory of ratio scalability with remarks about the measurement-theoretic concept of meaningfulness. Theory and Decision, 13, 1-70.
• Narens, L. (1981b). On the scales of measurement. Journal of Mathematical Psychology, 24, 249-275.
• Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
• Rozeboom, W.W. (1966). Scaling theory and the nature of measurement. Synthese, 16, 170-233.
• Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 677-680.
• Stevens, S.S. (1951). Mathematics, measurement and psychophysics. In S.S. Stevens (Ed.), Handbook of experimental psychology (pp. 1-49). New York: Wiley.
• Stevens, S.S. (1975). Psychophysics. New York: Wiley.
• Velleman, P. F. & Wilkinson, L. (1993). Nominal, ordinal, interval, and ratio typologies are misleading. The American Statistician, 47(1), 65-72. [On line] http://www.spss.com/research/wilkinson/Publications/Stevens.pdf
• von Eye, A. (2005). Review of Cliff and Keats, Ordinal measurement in the behavioral sciences. Applied Psychological Measurement, 29, 401-403.