Unusual software bug

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Unusual software bugs are a class of software bugs that are considered exceptionally difficult to understand and repair. There are several kinds, mostly named after scientists who discovered counterintuitive things.

Contents

[edit] Heisenbug

A heisenbug (named after the Heisenberg Uncertainty Principle) is a computer bug that disappears or alters its characteristics when an attempt is made to study it.

One common example is a bug that occurs in a program that was compiled with an optimizing compiler, but not in the same program when compiled without optimization (e.g. for generating a debug-mode version). Another example is a bug caused by a race condition. A heisenbug may also appear in a system that does not conform to the command-query separation design guideline, since a routine called more than once could return different values each time, generating hard-to-reproduce bugs in a race condition scenario.

The name "heisenbug" is a pun on the "Heisenberg uncertainty principle," a quantum physics concept which is commonly (yet inaccurately) used to refer to the fact that in the Copenhagen Interpretation model of quantum mechanical behavior, observers affect what they are observing, by the mere act of observing it alone (this is actually the observer effect, and is commonly confused with the Heisenberg uncertainty principle).

One common reason for heisenbug-like behaviour is that executing a program in debug mode often cleans memory before the program starts, and forces variables onto stack locations, instead of keeping them in registers. These differences in execution can alter the effect of bugs involving out-of-bounds member access or incorrect assumptions about the initial contents of memory. Another reason is that debuggers commonly provide watches or other user interfaces that cause additional code (such as property accessors) to be executed, which can, in turn, change the state of the program. Yet another reason is a fandango on core, the effect of a pointer running out of bounds. In C++, many heisenbugs are caused by uninitialized variables.

In an interview Bruce Lindsay tells of being there when the term was first used, and that it was created because Heisenberg said, "The more closely you look at one thing, the less closely can you see something else."[1]

This claim of origin is almost certainly wrong, as the term has been used for over two decades. For example, the earliest Google-archived mention is from the mailing list (later Usenet news group) comp.risks, moderated by Peter G. Neumann. In RISKS Digest Volume 4 : Issue 34, dated 23 December 1986[2], Zhahai Stewart contributes an item titled "Another heisenbug" noting that many such contributions have appeared in recent issues of RISKS Digest. The term, and especially the distinction Heisenbug/Bohrbug, was already mentioned in 1985 by Jim Gray in an often cited paper about software failures (http://citeseer.ist.psu.edu/gray85why.html).

[edit] Bohrbug

A Bohr bug or bohrbug (named after the Bohr atom model) is a bug that makes itself manifest consistently under a well-defined (but possibly unknown) set of conditions. Thus, in contrast with heisenbugs, a bohrbug does not disappear or alter its characteristics when it is researched. These include the easiest bugs to fix (where the nature of the problem is obvious), but also bugs that are hard to find and fix and remain in the software during the operational phase. Sometimes an error might occur only when a unique data set is entered, or unique circumstances are encountered. These kinds of bugs are often present in parts of source code that are not invoked very often and thus might remain undetected for an extended period of time, and are sometimes termed a ghost in the code.

For example, an overflow bug in a by-the-book binary search algorithm may exhibit itself only when the data array under search is very large and the item to be searched for is located near the end of the array. Because programmers tend to test their work using small arrays of data, and only recently have there existed machines with enough memory to hold a sufficiently large array, such a bug may go undetected for several years.[3]

[edit] Mandelbug

A mandelbug (named after fractal innovator Benoît Mandelbrot) is a computer bug whose causes are so complex that its behavior appears chaotic. This word also implies that the speaker thinks it is a bohrbug rather than a heisenbug.

Some use mandelbug to describe a bug whose behavior does not appear chaotic, but whose causes are so complex that there is no practical solution. An example of this is a bug caused by a flaw in the fundamental design of the entire system.

In the literature, there are inconsistent statements about the relationships between bohrbug, heisenbug, and mandelbug: According to the above definition, mandelbugs are bohrbugs. Heisenbug and bohrbug are considered antonyms. Moreover, it is claimed that all heisenbugs are mandelbugs.

In a recent column in IEEE Computer,[4] mandelbug is considered the complementary antonym to bohrbug; i.e., a software bug is either a bohrbug or a mandelbug. The apparently complex behavior of a mandelbug is assumed to be caused either by long delays between fault activation and the failure occurrence, or by influences of other software system elements (hardware, operating system, other applications) on the fault's behavior. Heisenbugs (whose behavior is influenced by a debugger, or other means of investigating the fault) are mandelbugs.

[edit] Schroedinbug

A schroedinbug is a bug that manifests only after someone reading source code or using the program in an unusual way notices that it never should have worked in the first place, at which point the program promptly stops working for everybody until fixed.

The Jargon File adds: "Though... this sounds impossible, it happens; some programs have harbored latent schroedinbugs for years."

The name schroedinbug is derived from the Schrödinger's cat thought experiment. A well written program executing in a reliable computing environment is expected to follow the principle of determinism, and that being so the quantum questions of observability (i.e. breaking the program by reading the source code) posited by Schrödinger (i.e. killing the cat by opening the box) cannot actually affect the operation of a program.

Repairing an obvious defective piece of code is often more important than determining what arcane set of circumstances caused it to work at all (or appear to work) in the first place, and why it then stopped. Because of this, many of these bugs are never fully understood. When bugs of this type are examined in enough detail, they can usually be reclassified as a Bohrbug, Heisenbug, or Mandelbug.

[edit] Phase of the Moon bug

The "phase of the moon" is sometimes spouted as a silly parameter on which a bug might depend, such as when exasperated after trying to isolate the true cause. The Jargon File documents two rare instances in which data processing problems were actually caused by phase-of-the-moon timing.[5]

In general, programs that exhibit time-dependent behavior are vulnerable to time-dependent failures. These could occur during a certain part of a scheduled process, or at special times, such as on leap days or when a process crosses a day, month, year, or century boundary (as with the Year 2000 bug).

[edit] Statistical bug

Statistical bugs can only be detected in aggregates and not in single runs of a section of code. These are bugs that usually affect code that is supposed to produce random or pseudo-random output. An example is code to generate points uniformly distributed on the surface of a sphere, say, and the result is that there are significantly more points in the northern hemisphere than the southern one. Tracing in detail through a single run of the point generator can completely fail to shed light on the location of such a bug because it is impossible to identify the output of any one run as wrong – after all, it's intended to be random. Only when many points are generated does the problem become apparent. Popular debugging techniques such as checking pre- and postconditions can do little to help. Similar problems can also occur in numerical algorithms in which each individual operation is accurate to within a given tolerance but where numerical errors accumulate only after a large number of runs, especially if the errors have a systematic bias.

[edit] See also

[edit] References

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

Personal tools