From Wikipedia, the free encyclopedia

Jump to: navigation, search

In computer science, the compare-and-swap CPU instruction ("CAS") (or the Compare & Exchange - CMPXCHG instruction in the x86 and Itanium architectures) is a special instruction that atomically compares the contents of a memory location to a given value and, if they are the same, modifies the contents of that memory location to a given new value. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response (this variant is often called compare-and-set), or by returning the value read from the memory location (not the value written to it).

CAS is used to implement synchronization primitives like semaphores and mutexes, as well as more sophisticated lock-free and wait-free algorithms. Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add, and that, assuming a fairly large[clarification needed] amount of memory, it can implement all of them [1].

Algorithms built around CAS typically read some key memory location and remember the old value. Based on that old value, they compute some new value. Then they try to swap in the new value using CAS, where the comparison checks for the location still being equal to the old value. If CAS indicates that the attempt has failed, it has to be repeated from the beginning: the location is re-read, a new value is computed and the CAS is tried again.

Some of these algorithms are affected by and must handle the problem of a false positive match, or the ABA problem. It's possible that between the time the old value is read and the time CAS is attempted, some other processors or threads change the memory location two or more times such that it acquires a bit pattern which matches the old value. The problem arises if this new bit pattern, which looks exactly like the old value, has a different meaning: for instance, it could be a recycled address, or a wrapped version counter.

A general solution to this is to use a double-length CAS (e.g. on a 32 bit system, a 64 bit CAS). The second half is used to hold a counter. The compare part of the operation compares the previously read value of the pointer *and* the counter, to the current pointer and counter. If they match, the swap occurs - the new value is written - but the new value has an incremented counter. This means that if ABA has occurred, although the pointer value will be the same, the counter is exceedingly unlikely to be the same (for a 32 bit value, exactly 2^32 operations would have had to occurred, causing the counter to wrap and at that moment, the pointer value would have to also by chance be the same).

On 64 bit systems with only single-length CAS, a 32 bit pointer and 32 bit counter can often be used, with an offset being added to the pointer when actual memory lookup occurs.

CAS, and other atomic instructions, are sometime thought to be unnecessary in uniprocessor systems, because the atomicity of any sequence of instructions can be achieved by disabling interrupts while executing it. However, disabling interrupts has numerous downsides. For example, code that is allowed to do so must be trusted not to be malicious and monopolize the CPU, as well as to be correct and not accidentally hang the machine in an infinite loop. Further, disabling interrupts is often deemed too expensive to be practical. Thus, even programs only intended to run on uniprocessor machines will benefit from atomic instructions, as in the case of Linux's futexes.

In multiprocessor systems, it is usually impossible to disable interrupts on all processors at the same time. Even if it were possible, two or more processors could be attempting to access the same semaphore's memory at the same time, and thus atomicity would not be achieved. The compare-and-swap instruction allows any processor to atomically test and modify a memory location, preventing such multiple-processor collisions.


[edit] Extensions

Since CAS operates on a single pointer-sized memory location, while most lock-free and wait-free algorithms need to modify multiple locations, several extensions have been implemented.

  • Double compare-and-swap compares two unrelated memory locations with two expected values, and if they're equal, sets both locations to new values. This operation only exists on Motorola 680x0 processors[2], but in early papers was often required.
  • Double-wide compare-and-swap operates on two adjacent pointer-sized locations (or, equivalently, one location twice as big as a pointer). On later x86 processors, the CMPXCHG8B and CMPXCHG16B instructions[3] serve this role, although early 64-bit AMD CPUs did not support CMPXCHG16B (modern AMD CPUs do).
  • Single compare, double swap compares one pointer but writes two. The Itanium's cmp8xchg16 instruction[4] implements this, where the two written pointers are adjacent.

[edit] See also

[edit] References

  1. ^ Herlihy, Maurice (January 1991). "Wait-free synchronization" (PDF). ACM Trans. Program. Lang. Syst. 13 (1): 124–149. doi:10.1145/114005.102808. Retrieved on 2007-05-20. 
  2. ^ "CAS2". Retrieved on 2007-12-15. 
  3. ^ "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M" (PDF). Retrieved on 2007-12-15. 
  4. ^ "Intel Itanium Architecture Software Developer’s Manual Volume 3: Instruction Set Reference" (PDF). Retrieved on 2007-12-15. 

[edit] External links

Personal tools