Memory leak

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In computer science, a memory leak is a particular type of unintentional memory consumption by a computer program where the program fails to release memory when no longer needed. This condition is normally the result of a bug in a program that prevents it from freeing up memory that it no longer needs.

This term has the potential to be confusing, since memory is not physically lost from the computer. Rather, memory is allocated to a program, and that program subsequently loses the ability to access it due to program logic flaws.

A memory leak has symptoms similar to a number of other problems (see below) and generally can only be diagnosed by a programmer with access to the program source code; however, many people refer to any unwanted increase in memory usage as a memory leak, even if this is not strictly accurate.

Contents

[edit] Consequences

A memory leak can diminish the performance of the computer by reducing the amount of available memory. Eventually, in the worst case, too much of the available memory may become allocated and all or part of the system or device stops working correctly, the application fails, or the system slows down unacceptably due to thrashing.

Memory leaks may not be serious or even detectable by normal means. In modern operating systems, normal memory used by an application is released when the application terminates. This means that a memory leak in a program that only runs for a short time is rarely serious.

Cases where leaks are much more serious include:

  • where the program is left running, and consumes more and more memory over time (such as background tasks, on servers, but especially in embedded devices which may be left running for many years);
  • where new memory is allocated frequently, such as when rendering the frames of a computer game or animated video
  • where the program is able to request memory (e.g. shared memory) that is not released, even when the program terminates;
  • where the leak is happening inside the operating system
  • where the leak is happening in a system critical driver
  • where memory is very limited e.g. in an embedded system or portable device
  • where running on operating systems (such as AmigaOS) where memory may not be automatically released on termination, and if lost can only be reclaimed by a reboot

[edit] A layman's example

This example, written in pseudocode, is intended to show how a memory leak can come about, and its effects, without needing any programming knowledge.

The program in this case is part of some very simple software designed to control a lift (elevator). This part of the program is run whenever anyone inside the lift presses the button for a floor.

When a button is pressed:
  Get some memory, which will be used to remember the floor number
  Put the floor number into the memory
  Are we already on the target floor?
    If so, we have nothing to do: finished
    Otherwise:
      Wait until the lift is idle
      Go to the required floor
      Release the memory we used to remember the floor number

The memory leak would occur if the floor's number pressed is the same floor that the lift is on; the condition for releasing the memory would be skipped. Each time this case occurs, more memory would be leaked.

Cases like this wouldn't usually have any immediate effects. People do not often press the button for the floor they are already on, and in any case, the lift might have enough spare memory that this could happen a hundred or a thousand times. However, the lift will eventually run out of memory. This could take months or years, so it might never be discovered by even relatively thorough testing.

The consequences in this case would be unpleasant; at the very least, the lift would stop responding to requests to move to another floor. If the program needs memory to open the lift door, then someone may also be trapped inside, since there is no memory available for that door to open.

The memory leak would only last as long as the program was running. For example: if the lift's power were turned off the program would stop running. When power was turned on again, the program would restart and all the memory would be available again, and the slow process of leaking would start again.

[edit] Programming issues

Memory leaks are a common error in programming, especially when using languages that have no built-in automatic garbage collection, such as C and C++. Typically, a memory leak occurs because dynamically allocated memory has become unreachable. The prevalence of memory leak bugs has led to the development of a number of debugging tools to detect unreachable memory. IBM Rational Purify, BoundsChecker, Valgrind, Insure++ and memwatch are some of the more popular memory debuggers for C and C++ programs. Garbage collection capabilities can be added to any programming language that lacks it as a built-in feature, and libraries for doing this are available for C and C++ programs.

Languages that provide automatic memory management, like Java, C#, VB.NET or LISP, are not immune to memory leaks. For example, a program could continue to add entries to a list, but then forget to remove them when done. The memory manager would not know if the entry will be referenced again, unless the program does something to indicate it is no longer needed. Normally, this is done by removing any reference to the item in question. This is similar to people placing items on a pile or in a drawer, and then forgetting about them.

Although the memory manager can recover memory that has become unreachable and therefore logically useless, it cannot free memory that is still reachable and therefore potentially still useful. Modern memory managers therefore provide techniques for programmers to semantically mark memory with varying levels of usefulness, which correspond to varying levels of reachability. The memory manager does not free an object that is strongly reachable. An object is strongly reachable if it is reachable either directly by a strong reference or indirectly by a chain of strong references. (A strong reference is a reference that, unlike a weak reference, prevents an object from being garbage collected.) To prevent this type of memory leak, the developer is responsible for cleaning up references after use, typically by setting the reference to null once it is no longer needed and, if necessary, by unregistering any event listeners that maintain strong references to the object.

In general, automatic memory management is more robust and convenient for developers, as they don't need to implement freeing routines or worry about the sequence in which cleanup is performed or be concerned about whether or not an object is still referenced. It is easier for a programmer to know when a reference is no longer needed than to know when an object is no longer referenced. However, automatic memory management can impose a performance overhead, and it does not eliminate all of the programming errors that cause memory leaks.

[edit] RAII

RAII, short for Resource Acquisition Is Initialization, is an approach to the problem commonly taken in C++, D, and Ada. It involves associating scoped objects with the acquired resources, and automatically releasing the resources once the objects are out of scope. Compare the following C and C++ examples:

/* C version */
#include <stdlib.h>
 
void f()
{
    int *array = calloc(1024, sizeof(int));
    /* Do some work with array here */
    free(array);
}
// C++ version
#include <vector>
 
void f()
{
    std::vector<int> array(1024);
    // Do some work with array here
}

The C version, as implemented in the example, requires explicit deallocation; the array is allocated from the heap, and continues to exist until explicitly freed. It should be noted, however, that this is only an example. C has the concept of automatic storage duration, and it would be perfectly possible, for example, to declare the array as a local array of integers, which would be deallocated automatically when leaving the function.

The C++ version requires no explicit deallocation; it will always occur automatically as soon as the object array goes out of scope. This avoids the overhead of garbage collection schemes, and can even be applied to resources other than memory such as file handles, which mark-and-sweep garbage collection does not handle as gracefully. However, using RAII correctly is not as easy as garbage collection and it has its own pitfalls. For instance, in C++, if one is not careful, it is possible to create dangling pointers (or references) by returning data by reference, only to have that data be deleted when its containing object goes out of scope.

D uses a combination of RAII and garbage collection, employing automatic destruction when it is clear that an object cannot be accessed outside its original scope, and garbage collection otherwise.

[edit] Reference counting and cyclic references

More modern garbage collection schemes are often based on a notion of reachability - if you don't have a usable reference to the memory in question, it can be collected. Other garbage collection schemes can be based on reference counting, where an object is responsible for keeping track of how many references are pointing to it. If the number goes down to zero, the object is expected to release itself and allow its memory to be reclaimed. The flaw with this model is that it doesn't cope with cyclic references, and this is why nowadays we are prepared to accept the burden of the more costly mark and sweep type of systems.

The following code illustrates the canonical reference-counting memory leak.

Dim A, B
Set A = CreateObject("Some.Thing")
Set B = CreateObject("Some.Thing")
' At this point, the two objects each have one reference,
Set A.member = B
Set B.member = A
' Now they each have two references. 
Set A = Nothing
' You could still get out of it...
Set B = Nothing
' You now have a memory leak.

In practice, this trivial example would be spotted straight away and fixed. In most real examples, the cycle of references spans more than two objects, and is more difficult to detect.

A well-known example of this kind of leak came to prominence with the rise of AJAX programming techniques in web browsers. Javascript code which associated a DOM element with an event handler and failed to remove the reference before exiting, would leak memory. (AJAX web pages keep a given DOM alive for a lot longer than traditional web pages, so this leak was much more apparent.)

[edit] Effects

If a program has a memory leak and its memory usage is steadily increasing, there will not usually be an immediate symptom. Every physical system has a finite amount of memory, and if the memory leak is not contained (for example, by restarting the program with the leak) it will sooner or later start to cause problems.

Most modern consumer desktop operating systems have both main memory which is physically housed in RAM microchips, and secondary storage such as a hard drive. Memory allocation is dynamic - each process gets as much memory as it requests. Active pages are transferred into main memory for fast access; inactive pages are pushed out to secondary storage to make room, as needed. When a single process starts consuming a large amount of memory, it usually occupies more and more of main memory, pushing other programs out to secondary storage - usually significantly slowing performance of the system. Even if the leaking program is terminated, it may take some time for other programs to swap back into main memory, and for performance to return to normal.

When all the memory on a system is exhausted (whether there is virtual memory or only main memory, such as on an embedded system) any attempt to allocate more memory will fail. This usually causes the program attempting to allocate the memory to terminate itself, or to generate a segmentation fault. Some programs are designed to recover from this situation (possibly by falling back on pre-reserved memory). The first program to experience the out-of-memory may or may not be the program that has the memory leak.

Some multi-tasking operating systems have special mechanisms to deal with an out-of-memory condition, such as killing processes at random (which may affect "innocent" processes), or killing the largest process in memory (which presuambly is the one causing the problem). Some operating systems have a per-process memory limit, to prevent any one program from hogging all of the memory on the system. The disadvantage to this arrangement is that the operating system sometimes must be re-configured to allow proper operation of programs that legitimately require large amounts of memory, such as those dealing with graphics, video, or scientific calculations.

If the memory leak is in the kernel, the operating system itself will likely fail. Computers without sophisticated memory management, such as embedded systems, may also completely fail from a persistent memory leak.

Publicly accessible systems such as web servers or routers are prone to denial-of-service attacks if an attacker discovers a sequence of operations which can trigger a leak. Such a sequence is known as an exploit.

[edit] Other memory consumers

Note that constantly increasing memory usage is not necessarily evidence of a memory leak. Some applications will store ever increasing amounts of information in memory (e.g. as a cache). If the cache can grow so large as to cause problems, this may be a programming or design error, but is not a memory leak as the information remains nominally in use. In other cases, programs may require an unreasonably large amount of memory because the programmer has assumed memory is always sufficient for a particular task; for example, a graphics file processor might start by reading the entire contents of an image file and storing it all into memory, something that is not viable where a very large image exceeds available memory.

To put it another way, a memory leak arises from a particular kind of programming error, and without access to the program code, someone seeing symptoms can only guess that there might be a memory leak. It would be better to use terms such as "constantly increasing memory use" where no such inside knowledge exists.

[edit] Simple C example

The following C function deliberately leaks memory by losing the pointer to the allocated memory. Since the program loops forever calling the defective function, malloc() will eventually fail (returning NULL) when no more memory is available to the program. The address of each allocation is stored in a local variable that only exists inside the function; this address is lost when the function returns, so it is impossible to free any of the previously allocated blocks.

#include <stdio.h>
#include <stdlib.h>
 
void f(void)
{
     void* s;
     s = malloc(50); /* get memory */
     return;         /* memory leak - see note below */ 
     /* 
      * Memory was available and pointed to by s, but not saved.
      * After this function returns, the pointer is destroyed, 
      * and the allocated memory becomes unreachable.
      *
      * To "fix" this code, either the f() function itself
      * needs to add "free(s)" somewhere or the s needs
      * to be returned from the f() and the caller of f() needs
      * to do the free().
      */
}
 
int main(void)
{
     /* this is an infinite loop calling the above function */
     while (1) f(); /* Malloc will return NULL sooner or later, due to lack of memory */
     return 0;
}

[edit] See also

[edit] References

[edit] External links

Personal tools