Abstract data type

From Wikipedia, the free encyclopedia

In computing, an abstract data type (ADT) is a specification of a set of data and the set of operations that can be performed on the data. Such a data type is abstract in the sense that it is independent of various concrete implementations. The definition can be mathematical, or it can be programmed as an interface. A first class ADT supports the creation of multiple instances of the ADT, and the interface normally provides a constructor, which returns an abstract handle to new data, and several operations, which are functions accepting the abstract handle as an argument.^[1]

The main contribution of the abstract data type theory (and its evolution, the design by contract) is that it (1) formalizes a definition of type (which was only intuitively hinted on procedural programming) (2) on the basis of the information hiding principle and (3) in a way that such formalization can be explicitly represented in programming language notations and semantics. This important advance in computer science theory (motivated by software engineering challenges in procedural programming) led to the emergence of languages and methodological principles of object-oriented programming.

[edit] Examples

Abstract data types (ADT) typically seen in textbooks and implemented in programming languages (or their libraries) include:

[edit] Separation of interface and implementation

When realized in a computer program, the ADT is represented by an interface, which shields a corresponding implementation. Users of an ADT are concerned with the interface, but not the implementation, as the implementation can change in the future. (This supports the principle of information hiding, or protecting the program from design decisions that are subject to change.)

The strength of an ADT is that the implementation is hidden from the user. Only the interface is published. This means that the ADT can be implemented in various ways, but as long as it adheres to the interface, user programs are unaffected.

There is a distinction, although sometimes subtle, between the abstract data type and the data structure used in its implementation. For example, a List ADT can be represented using an array-based implementation or a linked-list implementation. A List is an abstract data type with well-defined operations (add element, remove element, etc.) while a linked-list is a pointer-based data structure that can be used to create a representation of a List. The linked-list implementation is so commonly used to represent a List ADT that the terms are interchanged in common use.

Similarly, a Binary Search Tree ADT can be represented in several ways: binary tree, AVL tree, red-black tree, array, etc. Regardless of the implementation, the Binary Search Tree always has the same operations (insert, remove, find, etc.)

Separating the interface from the implementation doesn't always mean the user is unaware of the implementation method, but rather that they can't depend on any of the implementation details. For example, an ADT can be created using a scripting language or one that can be decompiled (like C). Even though the user can discover the implementation method, the construct can still be called an ADT as long as any client program that conforms to the interface is unaffected if the implementation changes.

In object-oriented parlance, an ADT is a class; an instance of an ADT or class is an object. Some languages include a constructor for declaring ADTs or classes. For example, C++ and Java provide a class constructor for this purpose.

[edit] Abstract data structure

An abstract data structure is an abstract storage for data defined in terms of the set of operations to be performed on data and computational complexity for performing these operations, regardless of the implementation in a concrete data structure.

Selection of an abstract data structure is crucial in the design of efficient algorithms and in estimating their computational complexity, while selection of concrete data structures is important for efficient implementation of algorithms.

This notion is very close to that of an abstract data type, used in the theory of programming languages. The names of many abstract data structures (and abstract data types) match the names of concrete data structures.

[edit] Built-in abstract data types

Because some ADTs are so common and useful in computer programs, some programming languages build implementations of ADTs into the language as native types or add them into their standard libraries. For instance, Perl arrays can be thought of as an implementation of the List or Deque ADTs and Perl hashes can be thought of in terms of Map or Table ADTs. The C++ Standard Library and Java libraries provide classes that implement the List, Stack, Queue, Map, Priority Queue, and String ADTs.

[edit] Concrete examples

[edit] Rational numbers as an abstract data type

Since most computers only have built-in circuitry for whole-number and floating-point operations, general rational numbers cannot be represented natively. A set of computer instruction codes would be needed to specify operations on rational numbers in terms of the native integer operations. An ADT "Rational" would typically specify the following aspects, among others, of the data type. (Rational numbers are integers and fractions like a/b where a and b are integers.)

Construction: Create an instance using two integers, a and b, where a represents the numerator and b represents the denominator.

Operations: addition, subtraction, multiplication, division, exponentiation, comparison, simplification, conversion to real (floating point) numbers.

To be a complete specification, any operation should be defined in terms of the data. For example, when multiplying two rational numbers a/b and c/d, the result is defined as (ac)/(bd). Typically, inputs, outputs, preconditions, postconditions, and assumptions for the ADT are specified as well.

The mathematical concept of rational numbers includes numbers with arbitrarily large numerator and denominator. Computer implementations are generally limited in various ways, and there are often tradeoffs between large capacities and high efficiency. An ADT will normally specify some minimum capacities and minimum efficiencies (maximum complexities).

With such an ADT, some programmers could create computer programs that use rational numbers, assuming that an implementation will be available, while other programmers could make available such an implementation. In this way, the ADT serves as a contract between programmers, specifying what each can expect from the others. An implementation would include the computer codes to actually perform the computations, and would, for instance have to include code for detecting when the products in ac/bd became too large for the chosen method of representing the parts.

To continue with the example, extracting the numerator and denominator could be, or not be, defined operations in such an ADT. If defined, it is conceivable that an implementation, after an instance has been created with the numbers 4 and 12, produces the numbers 1 and 3 respectively as the numerator and denominator. If the particular numbers returned by the implementation are to be predictable in specific ways, the ADT must say so. Things that are not specified in the ADT are left to the discretion of the implementor, and this may allow the implementor to find more efficient methods. The user of the implementation, on the other hand, must write his codes so that his program works correctly independently of such decisions by the implementor. The users code should be provably correct based on those properties of the implementation that are specified in the ADT. If later it is found that a particular implementation is incorrect, or works too slowly, another implementation of the same ADT could be subsituted without having to do a deep analysis of the user code.

[edit] Stack

[edit] Formal specification

[edit] Types:

E is the element type and T is the Stack type.

[edit] Functions:

T new (void)
T push (E,T)
E top(T)
T pop(T)
Boolean empty (T)

[edit] Axioms:

empty(new())
top(push(e,t)) = e
pop(push(e,t)) = t
not empty(push(e,t))

[edit] Preconditions:

.. top (T) requires not empty (T)
.. pop (T) requires not empty (T)

[edit] C-style interface and usage

The interface for a Stack ADT, written in C-style notation, might be:

long stack_create();                     /* create new instance of a stack */
void stack_push(long stack, void *item); /* push an item on the stack      */
void *stack_pop(long stack);             /* get item from top of stack     */
void stack_delete(long stack);           /* delete the stack               */

This ADT could be used in the following manner:

long stack;
struct foo *f;
 
stack = stack_create(); /* create a stack */
 
stack_push(stack, f); /* add foo structure to stack */
 
f = stack_pop(stack); /* get top structure from stack */

[edit] Implementation variants

The above stack ADT could be initially implemented using an array, and then later changed to a linked list, without affecting any user code. The number of ways a given ADT can be implemented depends on the programming language. For example, the above example could be written in C using a struct and an accompanying set of data structures using arrays or linked lists to store the entries; however, since the constructor function returns an abstract handle, the actual implementation is hidden from the user.

[edit] Notes

^ Robert Sedgewick (1998). Algorithms in C. Addison/Wesley. ISBN 0-201-31452-5. , definition 4.4. An alternative is to create an ADT that assumes it is the only instance. This means that a constructor is not required (although a routine to initialize the ADT may be), and individual functions need not specify an ADT handle.

[edit] See also

[edit] External links

Abstract data type in NIST Dictionary of Algorithms and Data Structures

[0] Robert Sedgewick (1998). Algorithms in C. Addison/Wesley. ISBN 0-201-31452-5. , definition 4.4. An alternative is to create an ADT that assumes it is the only instance. This means that a constructor is not required (although a routine to initialize the ADT may be), and individual functions need not specify an ADT handle.

[1]