Programming style

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Programming style is a set of rules or guidelines used when writing the source code for a computer program. It is often claimed that following a particular programming style will help programmers to read and understand source code conforming to the style, and help to avoid introducing errors.

A classic work on the subject was The Elements of Programming Style, written in the 1970s, and illustrated with examples from the Fortran and PL/I languages prevalent at the time.

The programming style used in a particular program may be derived from the coding standards or code conventions of a company or other computing organization, as well as the preferences of the author of the code. Programming styles are often designed for a specific programming language (or language family): style considered good in C source code may not be appropriate for BASIC source code, and so on. However, some rules are commonly applied to many languages.

Contents

[edit] Elements of good style

Good style is a subjective matter, and is difficult to define. However, there are several elements common to a large number of programming styles. The issues usually considered as part of programming style include the layout of the source code, including indentation; the use of white space around operators and keywords; the capitalization or otherwise of keywords and variable names; the style and spelling of user-defined identifiers, such as function, procedure and variable names; the use and style of comments; and the use or avoidance of particular programming constructs (such as GOTO statements).

[edit] Code appearance

Programming styles commonly deal with the visual appearance of source code, with the goal of requiring less human cognitive effort to extract information about the program. Software has long been available that formats source code automatically, leaving coders to concentrate on naming, logic, and higher techniques. As a practical point, using a computer to format source code saves time, and it is possible to then enforce company-wide standards without debates.

[edit] Indenting

Indent styles assist in identifying control flow and blocks of code. In some programming languages indentation is used to delimit logical blocks of code, correct indentation in these cases is more than a matter of style. In other languages indentation and whitespace does not affect function, although logical and consistent indentation makes code more readable. Compare:

if (hours < 24 && minutes < 60 && seconds < 60)
{
    return true;
}
else
{
    return false;
}

or

if (hours < 24 && minutes < 60 && seconds < 60) {
    return true;
} else {
    return false;
}

with something like

if  (    hours<
24  && minutes<
60  && seconds<
60  )
{return    true
;}         else
{return   false
;}

The first two examples are probably much easier to read because they are indented in an established way (a "hanging paragraph" style). This indentation style is especially useful when dealing with multiple nested constructs.

Python uses indentation to indicate control structures, so correct indentation is required. By doing this, the need for bracketing with curly braces ({ and }) is eliminated. On the other hand copying and pasting Python code can lead to problems, because the indentation level of the pasted code may not be the same as the indentation level of the current line. Such reformatting is tedious to do by hand, but some text editors and IDEs have features to do it automatically. There are also problems when Python code could be rendered unusable when posted on a forum or web page that removes whitespace: on webpages. A good precaution is to enclose code in "<pre> ... </pre>" HTML tags for proper display.

Haskell similarly has the off-side rule which lets indentation define blocks; however, unlike in Python, indentation is not compulsory in Haskell — curly braces and semicolons can be (and occasionally are) used instead.

[edit] Vertical alignment

It is often helpful to align similar elements vertically, to make typo-generated bugs more obvious. Compare:

$search = array('a', 'b', 'c', 'd', 'e');
$replacement = array('foo', 'bar', 'baz', 'quux');
 
// Another example:
 
$value = 0;
$anothervalue = 1;
$yetanothervalue = 2;

with:

$search      = array('a',   'b',   'c',   'd',   'e');
$replacement = array('foo', 'bar', 'baz', 'quux');
 
// Another example:
 
          $value = 0;
   $anothervalue = 1;
$yetanothervalue = 2;

The latter example makes two things intuitively clear that were not clear in the former:

  • the search and replace terms are related and match up: they are not discrete variables;
  • there is one more search term than there are replacement terms. If this is a bug, it is now more likely to be spotted.

Arguments against vertical alignment generally claim difficulty in maintaining the alignment. Such difficulty can be eliminated when using a source code editor that supports elastic tabstops.

[edit] Spaces

In those situations where some whitespace is required the grammars of most free-format languages are unconcerned with the amount that appears. Style related to whitespace is commonly used to enhance readability.

For instance, compare the following syntactically equivalent examples of C code.

int i;
for(i=0;i<10;++i){
    printf("%d",i*i+i);
}

versus

int i;
for (i=0; i<10; ++i) {
    printf("%d", i*i+i);
}

or even

int i;
for ( i = 0; i < 10; ++i ) {
    printf( "%d", i*i + i );
}

[edit] Tabs

The use of tabs to create white space presents particular issues because the location of the tabulation point can be different depending on the tools being used and even the preferences of the user.

As an example, one programmer prefers tab stops of four and has his toolset configured this way, and uses these to format his code.

int     ix;     // Index to scan array
long    sum;    // Accumulator for sum

Another programmer prefers tab stops of eight, and her toolset is configured this way. When she examines his code, she may well find it difficult to read.

int             ix;             // Index to scan array
long    sum;    // Accumulator for sum

Solutions to this issue may involve forbidding the use of tabs or rules on how tab stops must be set.

[edit] Naming, logic, and higher techniques

[edit] Appropriate variable names

Appropriate choices for variable names are seen as the keystone for good style. Poorly-named variables make code harder to read and understand.

For example, consider the following pseudocode snippet:

get a b c 
 
if a < 24 and b < 60 and c < 60
  return true
else
  return false

Because of the choice of variable names, the function of the code is difficult to work out. However, if the variable names are made more descriptive:

get hours minutes seconds 
 
if hours < 24 and minutes < 60 and seconds < 60
  return true
else
  return false

The code's intent is easier to discern, namely, "Given a 24-hour time, true will be returned if it is a valid time and false otherwise".

In early programming languages, variable names were restricted to only a few characters, to conserve the small amount of computer memory available to interpreters and compilers. A later "advance" allowed longer variable names to be used for human comprehensibility, but with only the first few characters were significant. In some versions of BASIC long names were allowed, but only the first two letters were significant; this led to terrible issues when variable names such as "VALUE" and "VAT" were used and intended to be distinct.

[edit] Boolean values in decision structures

Some programmers suggest that structures such as the above, where the result of the decision is merely computation of a Boolean value, are overly verbose and even prone to error. They prefer to have the decision in the computation itself, like this:

return (hours < 24) && (minutes < 60) && (seconds < 60);

The difference is entirely stylistic, because optimizing compilers may produce identical object code for both forms. However, stylistically, programmers disagree which form is easier to read and maintain.

One argument in favor of the longer form is that many debuggers allow a programmer to step line by line; if a test also changes to the variables you were testing, and you wanted to examine the values of all variables after that test, then only the longer form permits that to be debugged. The shorter form would not allow the debugger to reach a line "after the test" where those variables still exist.

[edit] Left-hand comparisons

In languages which use one symbol (typically a single equals sign, (=)) for assignment and a another (typically two equals signs, (==) for comparison (e.g. C/C++, Java, PHP, Perl numeric context, and most languages in the last 15 years), and where assignments may be made within control structures, there is an advantage to adopting the left-hand comparison style: to place constants or expressions to the left in any comparison. [1][1]

Here are both left and right-hand comparison styles, applied to a line of Perl code. In both cases, this compares the value in the variable $a against 42, and if it matches, executes the code in the subsequent block.

if ( $a == 42 ) { ... }  # A right-hand comparison checking if $a equals 42.
if ( 42 == $a ) { ... }  # Recast, using the left-hand comparison style.

The difference occurs when a developer accidentally types = instead of == (see example below).

The first (right-hand) line now contains a potentially subtle flaw: rather than the previous behaviour, it now sets the value of $a to be 42, and then always runs the code in the following block. As this is syntactically legitimate, the error may go unnoticed by the programmer, and their software will ship with a bug.

The second (left-hand) line contains a semantic error, as numeric values cannot be assigned to. This will result in a diagnostic message being generated when the code is compiled, so the error cannot go unnoticed by the programmer.

if ( $a = 42 ) { ... }  # Inadvertent assignment which is often hard to debug
if ( 42 = $a ) { ... }  # Compile time error indicates source of problem

Some languages have built-in protections against inadvertent assignment. Java and C#, for example, do not support automatic conversion to boolean for just this reason.

The risk can also be mitigated by use of static analysis tools that can detect this issue (e.g. Lint).

[edit] Looping and control structures

The use of logical control structures for looping adds to good programming style as well. It helps someone reading code to understand the program's sequence of execution (in imperative programming languages). For example, in pseudocode:

i = 0
 
while i < 5
  print i * 2
  i = i + 1
end while

print "Ended loop"

The above snippet obeys the naming and indentation style guidelines, but the following use of the "for" construct makes the code much easier to read:

for i = 0, i < 5, i=i+1
  print i * 2
 
print "Ended loop"

In many languages, the often used "for each element in a range" pattern can be shortened to:

for i = 0 to 5
  print i * 2
 
print "Ended loop"

In curly bracket programming languages, it has become common for style documents to require that even where optional, curly brackets be placed after all control flow constructs.

for (i = 0 to 5) {
  print i * 2;
}
 
print "Ended loop";

This prevents program-flow bugs which can be time-consuming to track down, such as where a terminating semicolon is introduced at the end of the construct (a common typo):

 for (i = 0; i < 5; ++i);
    printf("%d\n", i*2);    /* The incorrect indentation hides the fact that this line is not part of the loop body. */
 
 printf("Ended loop");

...or where another line is added before the first:

 for (i = 0; i < 5; ++i)
    fprintf(logfile, "loop reached %d\n", i);
    printf("%d\n", i*2);    /* The incorrect indentation hides the fact that this line is not part of the loop body. */
 
 printf("Ended loop");

[edit] Lists

Where items in a list are placed on separate lines, it is sometimes considered good practice to add the item-separator after the final item, as well as between each item, at least in those languages where doing so is supported by the syntax (e.g, C):

 const char *array[] = {
     "item1",
     "item2",
     "item3",  /* still has the comma after it */
 };

This prevents syntax errors or subtle string-concatenation bugs when the list items are re-ordered or more items are added to the end, without the programmer's noticing the "missing" separator on the line which was previously last in the list. However, this technique can result in a syntax error (or misleading semantics) in some languages. Even for languages that do support trailing commas, not all list-like syntactical constructs in those languages may support it.

[edit] See also

[edit] References

  1. ^ Sklar, David; Adam Trachtenberg (2003). PHP Cookbook. O'Reilly. , recipe 5.1 "Avoiding == Versus = Confusion, p118

[edit] External links

[edit] Coding conventions for languages

[edit] Coding conventions for projects

Personal tools