Structured programming

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Structured programming can be seen as a subset or subdiscipline of procedural programming, one of the major programming paradigms. It is most famous for removing or reducing reliance on the GOTO statement.

Historically, several different structuring techniques or methodologies have been developed for writing structured programs. The most common are:

  1. Edsger Dijkstra's structured programming, where the logic of a program is a structure composed of similar sub-structures in a limited number of ways. This reduces understanding a program to understanding each structure on its own, and in relation to that containing it, a useful separation of concerns.
  2. A view derived from Dijkstra's which also advocates splitting programs into sub-sections with a single point of entry, but is strongly opposed to the concept of a single point of exit.
  3. Data Structured Programming or Jackson Structured Programming, which is based on aligning data structures with program structures. This approach applied the fundamental structures proposed by Dijkstra, but as constructs that used the high-level structure of a program to be modeled on the underlying data structures being processed. There are at least 3 major approaches to data structured program design proposed by Jean-Dominique Warnier, Michael A. Jackson, and Ken Orr.

The two latter meanings for the term "structured programming" are more common, and that is what this article will discuss. Years after Dijkstra (1969), object-oriented programming (OOP) was developed to handle very large or complex programs (see below: Object-oriented comparison).

Contents

[edit] Low-level structure

At a low level, structured programs are often composed of simple, hierarchical program flow structures. These are sequence, selection, and repetition:

  • "Sequence" refers to an ordered execution of statements.
  • In "selection" one of a number of statements is executed depending on the state of the program. This is usually expressed with keywords such as if..then..else..endif, switch, or case.
  • In "repetition" a statement is executed until the program reaches a certain state or operations are applied to every element of a collection. This is usually expressed with keywords such as while, repeat, for or do..until. Often it is recommended that each loop should only have one entry point (and in the original structural programming, also only one exit point), and a few languages enforce this.

Some languages, such as Dijkstra's original Guarded Command Language, emphasise the unity of these structures with a syntax which completely encloses the structure, as in if..fi. In others, such as C, this is not the case, which increases the risk of misunderstanding and incorrect modification.

A language is described as "block-structured" when it has a syntax for enclosing structures between bracketed keywords, such as an if-statement bracketed by if..fi as in ALGOL 68, or a code section bracketed by BEGIN..END, as in PL/I. However, a language is described as "comb-structured" when it has a syntax for enclosing structures within an ordered series of keywords. A "comb-structured" language has multiple structure keywords to define separate sections within a block, analogous to the multiple teeth or prongs in a comb separating sections of the comb. For example, in Ada, a block is a 4-pronged comb with keywords DECLARE, BEGIN, EXCEPTION, END, and the if-statement in Ada is a 4-pronged comb with keywords IF, THEN, ELSE, END IF.

[edit] Design

Structured programming is often (but not always) associated with a "top-down" approach to design.

[edit] Structured programming languages

It is possible to do structured programming in any programming language, though it is preferable to use something like a procedural programming language. Since about 1970 when structured programming began to gain popularity as a technique, most new procedural programming languages have included features to encourage structured programming (and sometimes have left out features that would make unstructured programming easy). Some of the better known structured programming languages are ALGOL, Pascal, PL/I and Ada.


[edit] History

[edit] Theoretical foundation

The structured program theorem provides the theoretical basis of structured programming. It states that three ways of combining programs—sequencing, selection, and iteration—are sufficient to express any computable function. This observation did not originate with the structured programming movement; these structures are sufficient to describe the instruction cycle of a central processing unit, as well as the operation of a Turing machine. Therefore a processor is always executing a "structured program" in this sense, even if the instructions it reads from memory are not part of a structured program. However, authors usually credit the result to a 1966 paper by Böhm and Jacopini, possibly because Dijkstra cited this paper himself. The structured program theorem does not address how to write and analyze a usefully structured program. These issues were addressed during the late 1960s and early 1970s, with major contributions by Dijkstra, Robert W. Floyd, Tony Hoare, and David Gries.

[edit] Debate

P. J. Plauger, an early adopter of structured programming, described his reaction to the structured program theorem:

Us converts waved this interesting bit of news under the noses of the unreconstructed assembly-language programmers who kept trotting forth twisty bits of logic and saying, 'I betcha can't structure this.' Neither the proof by Böhm and Jacopini nor our repeated successes at writing structured code brought them around one day sooner than they were ready to convince themselves.

In 1967 a letter from Dijkstra appeared in Communications of the ACM with the heading "Goto statement considered harmful." The letter, which cited the Böhm and Jacopini proof, called for the abolishment of unconstrained GOTO from high-level languages in the interest of improving code quality. This letter is usually cited as the beginning of the structured programming debate.

Although, as Plauger mentioned, many programmers unfamiliar with the theorem doubted its claims, the more significant dispute in the ensuing years was whether structured programming could actually improve software's clarity, quality, and development time enough to justify training programmers in it. Dijkstra claimed that limiting the number of structures would help to focus a programmer's thinking, and would simplify the task of ensuring the program's correctness by dividing analysis into manageable steps. In his 1969 Notes on Structured Programming, Dijkstra wrote:

When we now take the position that it is not only the programmer's task to produce a correct program but also to demonstrate its correctness in a convincing manner, then the above remarks have a profound influence on the programmer's activity: the object he has to produce must be usefully structured.
…In what follows it will become apparent that program correctness is not my only concern, program adaptability or manageability will be another… 1

Donald Knuth accepted the principle that programs must be written with provability in mind, but he disagreed (and still disagrees[citation needed]) with abolishing the GOTO statement. In his 1974 paper, "Structured Programming with Goto Statements", he gave examples where he believed that a direct jump leads to clearer and more efficient code without sacrificing provability. Knuth proposed a looser structural constraint: It should be possible to draw a program's flow chart with all forward branches on the left, all backward branches on the right, and no branches crossing each other. Many of those knowledgeable in compilers and graph theory have advocated allowing only reducible flow graphs.

Structured programming theorists gained a major ally in the 1970s after IBM researcher Harlan Mills applied his interpretation of structured programming theory to the development of an indexing system for the New York Times research file. The project was a great engineering success, and managers at other companies cited it in support of adopting structured programming, although Dijkstra criticized the ways that Mills's interpretation differed from the published work.

As late as 1987 it was still possible to raise the question of structured programming in a computer science journal. Frank Rubin did so in that year with a letter, "'GOTO considered harmful' considered harmful." Numerous objections followed, including a response from Dijkstra that sharply criticized both Rubin and the concessions other writers made when responding to him.

[edit] Outcome

By the end of the 20th century nearly all computer scientists were convinced that it is useful to learn and apply the concepts of structured programming. High-level programming languages that originally lacked programming structures, such as FORTRAN, COBOL, and BASIC, now have them.

[edit] Common deviations

[edit] Exception handling

Although there is almost never a reason to have multiple points of entry to a subprogram, multiple exits are often used to reflect that a subprogram may have no more work to do, or may have encountered circumstances that prevent it from continuing.

A typical example of a simple procedure would be reading data from a file and processing it:

open file;
while (reading not finished) {
  read some data;
  if (error) {
    stop the subprogram and inform rest of the program about the error;
  }
}
process read data;
finish the subprogram;

The "stop and inform" may be achieved by throwing an exception, second return from the procedure, labelled loop break, or even a goto. As the procedure has 2 exit points, it breaks the rules of Dijkstra's structured programming. Coding it in accordance with single point of exit rule would be very cumbersome. If there were more possible error conditions, with different cleanup rules, single exit point procedure would be extremely hard to read and understand, very likely even more so than an unstructured one with control handled by goto statements. On the other hand, structural programming without such a rule would result in very clean and readable code.

Most languages have adopted the multiple points of exit form of structural programming. C allows multiple paths to a structure's exit (such as "continue", "break", and "return"), newer languages have also "labelled breaks" (similar to the former, but allowing breaking out of more than just the innermost loop) and exceptions.

[edit] State machines

Some programs, particularly parsers and communications protocols, have a number of states that follow each other in a way that is not easily reduced to the basic structures. It is possible to structure these systems by making each state-change a separate subprogram and using a variable to indicate the active state (see trampoline). However, some programmers (including Knuth[citation needed]) prefer to implement the state-changes with a jump to the new state.

[edit] Object-oriented comparison

In the 1960s, language design was often based on textbook examples of programs, which were generally small (due to the size of a textbook); however, when programs became very large, the focus changed. In small programs, the most common statement is generally the assignment statement; however, in large programs (over 10,000 lines), the most common statement is typically the procedure-call to a subprogram. Ensuring parameters are correctly passed to the correct subprogram becomes a major issue.

Many small programs can be handled by coding a hierarchy of structures; however, in large programs, the organization is more a network of structures, and insistence on hierarchical structuring for data and procedures can produce cumbersome code with large amounts of "tramp data." For example, a text-display program that allows dynamically changing the font-size of the entire screen would be very cumbersome if coded by passing font-size data through a hierarchy. Instead, a subsystem could be used to control the font data through a set of accessor functions that set or retrieve data from a common area controlled by that font-data subsystem. Databases are a common way around tramping.

The FORTRAN language has used labelled COMMON-blocks to separate global program data into subsystems (no longer global) to allow program-wide, network-style access to data, such as font-size, but only by specifying the particular COMMON-block name. Confusion could occur in FORTRAN by coding alias names and changing data-types when referencing the same labelled COMMON-block yet mapping alternate variables to overlay the same area of memory. Regardless, the labelled-COMMON concept was very valuable in organizing massive software systems and lead to the use of object-oriented programming to define subsystems of centralized data controlled by accessor functions. Changing data into other data-types was performed by explicitly converting, or casting, data from the original variables.

Global subprogram names were recognized as just as dangerous (or even more dangerous) than global variables or blank COMMON, and subsystems were limited to isolated groups of subprogram names, such as naming with unique prefixes or using Java package names.

Although structuring a program into a hierarchy might help to clarify some types of software, even for some special types of large programs, a small change, such as requesting a user-chosen new option (text font-color) could cause a massive ripple-effect with changing multiple subprograms to propagate the new data into the program's hierarchy. The object-oriented approach is allegedly more flexible, by separating a program into a network of subsystems, with each controlling their own data, algorithms, or devices across the entire program, but only accessible by first specifying named access to the subsystem object-class, not just by accidentally coding a similar global variable name. Rather than relying on a structured-programming hierarchy chart, object-oriented programming needs a call-reference index to trace which subsystems or classes are accessed from other locations.

Modern structured systems have tended away from deep hierarchies found in the 1970s and tend toward "event driven" architectures, where various procedural events are designed as relatively independent tasks.

Structured programming, as a forerunner to object-oriented programming, noted some crucial issues, such as emphasizing the need for a single exit-point in some types of applications, as in a long-running program with a procedure that allocates memory and should deallocate that memory before exiting and returning to the calling procedure. Memory leaks that cause a program to consume vast amounts of memory could be traced to a failure to observe a single exit-point in a subprogram needing memory deallocation.

Similarly, structured programming, in warning of the rampant use of goto-statements, led to a recognition of top-down discipline in branching, typified by Ada's GOTO that cannot branch to a statement-label inside another code block. However, "GOTO WrapUp" became a balanced approach to handling a severe anomaly without losing control of the major exit-point to ensure wrap-up (for deallocating memory, deleting temporary files, and such), when a severe issue interrupts complex, multi-level processing and wrap-up code must be performed before exiting.

The various concepts behind structured programming can help to understand the many facets of object-oriented programming.

[edit] See also

[edit] References

  1. Edsger Dijkstra, Notes on Structured Programming, pg. 6
  2. Böhm, C. and Jacopini, G.: Flow diagrams, Turing machines and languages with only two formation rules, CACM 9(5), 1966.
  3. Michael A. Jackson, Principles of Program Design, Academic Press, London, 1975.
  4. O.-J. Dahl, E. W. Dijkstra, C. A. R. Hoare Structured Programming, Academic Press, London, 1972 ISBN 0-12-200550-3
    • this volume includes an expanded version of the Notes on Structured Programming, above, including an extended example of using the structured approach to develop a backtracking algorithm to solve the 8 Queens problem.
    • a pdf version is in the ACM Classic Books Series
    • Note that the third chapter of this book, by Dahl, describes an approach which is easily recognized as Object Oriented Programming. It can be seen as another way to "usefully structure" a program to aid in showing that is is correct.

[edit] External links

Personal tools