Literate programming

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Literate programming is an approach to programming which was introduced by Donald Knuth. Knuth conceived literate programming as an alternative to the structured programming paradigm of the 1970s.[1]

The literate programming paradigm, as conceived by Knuth, represents a move away from writing programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts.[2] Literate programs are written as an uninterrupted exposition of logic in an ordinary human language, much like the text of an essay, in which macros which hide abstractions and traditional source code are included. Literate programming tools are used to both en-tangle a literate program into a form suitable for further compilation or execution, and to weave the program into formatted documentation. While the first generation of literate programming tools were computer language-specific, the later ones are language-agnostic and exist above the programming languages.


[edit] Concept

A literate program is an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code. Macros in a literate source file are simply title-like or explanatory phrases in a human language that describe human abstractions created while solving the programming problem, and hiding chunks of code or lower-level macros. These macros are similar to the algorithms in pseudocode typically used in teaching computer science. These arbitrary explanatory phrases become precise new operators, created on the fly by the programmer, forming a meta-language on top of the underlying programming language.

A preprocessor is used to substitute arbitrary hierarchies, or rather "interconnected 'webs' of macros",[3] to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.

[edit] Advantages of the method

According to Knuth,[4][5] literate programming provides for higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program creation.[6] The resulting documentation allows authors to restart their own thought processes at any later time, and allows other programmers to more easily understand the construction of the program. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking in general, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs is proven by an edition of TeX code as a literate program.

[edit] Misconceptions

Literate programming is very often misunderstood[7] to refer only to formatted documentation produced from a common file with both source code and comments, or to voluminous commentaries included with code. This misconception has lead to claims that comment-extraction tools, such as the Perl Plain Old Documentation system, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to the change the order the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.[7][8]

[edit] Example

A classic example of literate programming is the literate implementation of the standard Unix wc word counting program. Knuth presented a CWEB version of this example in Chapter 12 of his Literate Programming book. The same example was later rewritten for the noweb literate programming tool.[9] This example provides a good illustration of the basic elements of literate programming.

Creation of macros

The following snippet of the wc literate program[9] shows how arbitrary descriptive phrases in a natural language are used in a literate program to create macros, which act as new "operators" in the literate programming language, and hide chunks of code or other macros. The mark-up notation consists of double angle brackets ("<<...>>")that indicate macros, the "@" symbol which indicates the end of the code section in a noweb file. The "<<*>>" symbol stands for the "root", topmost node the literate programming tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as "<<name of the chunk>>=", with the equal sign), so one literate program file can contain several files with machine source code.

The purpose of wc is to count lines, words, and/or characters in a list of files. The 
number of lines in a file is ......../more explanations/
Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw: 
    <<Header files to include>>
    <<Global variables>>
    <<The main program>>
We must include the standard I/O definitions, since we want to send formatted output 
to stdout and stderr. 
    <<Header files to include>>=
    #include <stdio.h>

Note also that the unraveling of the chunks can be done in any place in the literate program text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text that envelops the whole program.

Program as a Web - Macros are not just section names

Macros are not the same as "section names" in standard documentation. Literate programming macros can hide any chunk of code behind themselves, and be used inside any low-level machine language operators, often inside logical operators such as "if", "while" or "case". This is illustrated by the following snippet of the wc literate program.[9]

The present chunk, which does the counting that is wc's raison d'etre, was actually one of 
the simplest to write. We look at each character and change state if it begins or ends 
a word. 
    <<Scan file>>=
    while (1) {
      <<Fill buffer if it is empty; break at end of file>>
      c = *ptr++;
      if (c > ' ' && c < 0177) {
        /* visible ASCII codes */
        if (!in_word) {
          in_word = 1;
      if (c == '\n') line_count++;
      else if (c != ' ' && c != '\t') continue;
      in_word = 0;
        /* c is newline, space, or tab */

In fact, macros can stand for any arbitrary chunk of code or other macros, and are thus more general than top-down or bottom-up "chunking", or than subsectioning. Knuth says that when he realized this, he began to think of a program as a web of various parts.[1]

Order of human logic, not that of the compiler

In a noweb literate program besides the free order of their exposition, the chunks behind macros, once introduced with "<<...>>=", can be grown later in any place in the file by simply writing "<<name of the chunk>>+=" and adding more content to it, as the following snippet illustrates.[9]

 The grand totals must be initialized to zero at the beginning of the program. 
If we made these variables local to main, we would have to do this  initialization 
explicitly; however, C globals are automatically zeroed. (Or rather,``statically 
zeroed.'' (Get it?) 
    <<Global variables>>+=
    long tot_word_count, tot_line_count, 
      /* total number of words, lines, chars */
Record of the train of thought creates superior documentation

The documentation for a literate program is produced as part of writing the program. Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows for better communication of thought. The snippets of the literate wc above show how an explanation of the program and its source code are interwoven. Such exposition of ideas creates the flow of thought that is like a literary work. Knuth famously wrote a "novel" which explains the code of a computer strategy game, perfectly readable.

[edit] Literate programming tools

The first published literate programming environment was WEB, introduced by Donald Knuth in 1981 for his TeX typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting. Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of Pierre-Arnoul de Marneffe[10]. The free CWEB, written by Knuth and Silvio Levy, is WEB adapted for C and C++, runs on most operating systems and can produce TeX and PDF documentation.

Other implementations of the literate programming concept are noweb and FunnelWeb, both of which are independent of the programming language of the source code. Noweb is well-known for its simplicity: just 2 text markup conventions and 2 tool invocations are needed to use it, and it allows for text formatting in HTML rather than going through the TeX system. FunnelWeb is another program without dependency on TeX which can produce HTML documentation output. It has more complicated markup (with "@" escaping any FunnelWeb command), but has many more flexible options.

The Leo text editor is an outlining editor which supports optional noweb and CWEB markup. The author of Leo actually mixes two different approaches: first, Leo is an outlining editor, which helps with management of large texts, second, Leo incorporates some of the ideas of literate programming, which in its pure form (i.e. the way it is used by Knuth Web tool and/or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes). However this and other extensions (@file nodes) make outline programming and text management successful and easy and in some ways similar to literate programming.[11]

The Haskell programming language has native support for semi-literate programming, inspired by CWEB but with a simpler implementation. When aiming for TeX output, one writes a plain LaTeX file where source code is marked by a given surrounding environment; LaTeX can be set up to handle that environment, while the Haskell compiler looks for the right markers to identify Haskell statements to compile, removing the TeX documentation as if they were comments. However, as described above, this is not literate programming in the sense intended by Knuth. Haskell's functional, modular nature[12] makes literate programming directly in the language somewhat easier, but it is not nearly as powerful as one of the a WEB tools where "tangle" can reorganize in arbitrary ways.

[edit] See Also

Sweave - an example of use of the "noweb"-like Literate Programming tool inside the R language for creation of dynamic statistical reports

[edit] References

  1. ^ a b v w x y z Knuth, Donald E. (1984). "Literate Programming" (PDF). The Computer Journal (British Computer Society) 27 (2): 97-111. doi:10.1093/comjnl/27.2.97. Retrieved on January 4, 2009. 
  2. ^

    "I had the feeling that top-down and bottom-up were opposing methodologies: one more suitable for program exposition and the other more suitable for program creation. But after gaining experience with WEB, I have come to realize that there is no need to choose once and for all between top-down and bottom-up, because a program is best thought of as a web instead of a tree. A hierarchical structure is present, but the most important thing about a program is its structural relationships. A complex piece of software consists of simple parts and simple relations between those parts; the programmer's task is to state those parts and those relationships, in whatever order is best for human comprehension not in some rigidly determined order like top-down or bottom-up."

    Donald E. Knuth , Literate Programming[1]

  3. ^

    "WEB's macros are allowed to have at most one parameter. Again, I did this in the interests of simplicity, because I noticed that most applications of multiple parameters could in fact be reduced to the one-parameter case. For example, suppose that you want to define something like... In other words, the name of one macro can usefully be a parameter to another macro. This particular trick makes it possible to..."

    Donald E. Knuth , Literate Programming[1]

  4. ^ Knuth, Donald E.; Andrew Binstock (April 25, 2008). "Interview with Donald Knuth". Retrieved on January 4, 2009. "Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s-it has actually been indispensable at times. Some of my major programs, such as the MMIX meta-simulator, could not have been written with any other methodology that I've ever heard of. The complexity was simply too daunting for my limited brain to handle; without literate programming, the whole enterprise would have flopped miserably. ... Literate programming is what you need to rise above the ordinary level of achievement." 
  5. ^

    "Another surprising thing that I learned while using WEB was that traditional programming languages had been causing me to write inferior programs, although I hadn't realized what I was doing. My original idea was that WEB would be merely a tool for documentation, but I actually found that my WEB programs were better than the programs I had been writing in other languages."

    Donald E. Knuth , Literate Programming[1]

  6. ^

    "Thus the WEB language allows a person to express programs in a "stream of consciousness" order. TANGLE is able to scramble everything up into the arrangement that a PASCAL compiler demands. This feature of WEB is perhaps its greatest asset; it makes a WEB-written program much more readable than the same program written purely in PASCAL, even if the latter program is well commented. And the fact that there's no need to be hung up on the question of top-down versus bottom-up, since a programmer can now view a large program as a web, to be explored in a psychologically correct order is perhaps the greatest lesson I have learned from my recent experiences."

    Donald E. Knuth , Literate Programming[1]

  7. ^ a b Dominus, Mark-Jason (March 20, 2000). "POD is not Literate Programming". Retrieved on January 3, 2009. 
  8. ^

    "I chose the name WEB partly because it was one of the few three-letter words of English that hadn't already been applied to computers. But as time went on, I've become extremely pleased with the name, because I think that a complex piece of software is, indeed, best regarded as a web that has been delicately pieced together from simple materials. We understand a complicated system by understanding its simple parts, and by understanding the simple relations between those parts and their immediate neighbors. If we express a program as a web of ideas, we can emphasize its structural properties in a natural and satisfying way."

    Donald E. Knuth , Literate Programming[1]

  9. ^ a b c d Ramsey, Norman (May 13, 2008). "An Example of noweb". Retrieved on January 4, 2009. 
  10. ^ de Marneffe, Pierre Arnoul (December 1973). Holon Programming - Report PMAR 73-23. University de Liège, Service d'Informatique. 
  11. ^ Ream, Edward K. (September 2, 2008). "Leo's Home Page". Retrieved on January 4, 2009. 
  12. ^ Hughes, John (January 9, 2002). Why Functional Programming Matters. Institutionen för Datavetenskap, Chalmers Tekniska Högskola,. Retrieved on January 4, 2009. 

[edit] Further reading

[edit] External links

Personal tools