DocBook
From Wikipedia, the free encyclopedia
Filename extension | .dbk, .xml |
---|---|
Internet media type | application/docbook+xml |
Developed by | OASIS |
Type of format | markup language |
Extended from | SGML, XML |
Standard(s) | 4.5 (June 2006), 5.0 (Public Review Draft 1, February 6, 2008) |
DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation.
As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML, PDF, man pages and HTML Help, without requiring users to make any changes to the source.
Contents |
[edit] Overview
DocBook, in its current version 5.0, is an XML language. DocBook's language is defined by a RELAX NG schema with integrated Schematron rules. There are also W3C XML Schema+Schematron and Document Type Definition (DTD) versions of the schema available, but these are considered non-standard.
As a semantic language, DocBook documents do not describe what their contents look like, but rather the meaning of those contents. For example, rather than explaining how the abstract for an article might be visually formatted, DocBook simply says that a particular section is an abstract. It is up to an external processing tool or application to decide where on a page the abstract should go and what it should look like.
DocBook provides a vast number of semantic element tags. They are divided into three broad categories: structural, block-level, and inline. Structural tags specify broad characteristics of their contents. The book element, for example, specifies that its child elements represent the parts of a book. This includes a title, chapters, glossaries, appendices, and so on. DocBook's structural tags include, but are not limited to:
- set: a titled collection of one or more books. Sets can be nested with other sets.
- book: a titled collection of chapters, articles, and/or parts, with optional glossaries, appendices, and so forth.
- part: a titled collection of one or more chapters. Parts can be nested with other parts. May have special introductory text.
- article: a titled, unnumbered collection of block-level elements.
- chapter: a titled, numbered collection of block-level elements. DocBook does not actually require that chapters be explicitly given numbers; it is understood by the semantics that the number of a chapter is the number of previous chapter elements in the XML document plus 1.
- appendix: the contained text represents an appendix.
- dedication: the text represents the dedication of the contained structural element.
Structural elements can contain other structural elements. Structural elements are the only permitted top-level elements in a DocBook document.
Block-level tags are elements like paragraph, lists, and so forth. Not all of these elements can contain actual text directly. Sequential block-level elements are expected to be rendered one "below" another. Below, in this case, can differ depending on the language. In most Western languages, Below means below; text paragraphs are printed down the page. In Japanese, by contrast, text is often printed in columns with paragraphs running from right to left, so "below" in that case would be to the left. DocBook semantics are entirely neutral to these kinds of language-based concepts.
Inline-level tags are elements like emphasis, hyperlinks, and so forth. They wrap text within a block-level element. These elements do not cause the text to break when rendered in a paragraph format, but they can provide special typographical treatment to the text, by changing the font, size, or similar attributes. The specific kind of changes are not part of the DocBook specification. That is, it is not required that a DocBook processor transform an emphasis tag into italics. A reader-based DocBook processor could increase the volume of the words. Or a text-based processor could use bold instead of italics. The DocBook specification does say that it expects different typographical treatment, but it does not offer specific requirements as to what this treatment may be.
[edit] Sample document
<?xml version="1.0" encoding="UTF-8"?> <book xml:id="simple_book" xmlns="http://docbook.org/ns/docbook" version="5.0"> <title>Very simple book</title> <chapter xml:id="chapter_1"> <title>Chapter 1</title> <para>Hello world!</para> <para>I hope that your day is proceeding <emphasis>splendidly</emphasis>!</para> </chapter> <chapter xml:id="chapter_2"> <title>Chapter 2</title> <para>Hello again, world!</para> </chapter> </book>
Semantically, this document is a book, with a title, that contains two chapters with their own titles. Those chapters contain paragraphs that have text in them. The markup is fairly readable in English.
In more detail, the root element of the document is book. All DocBook elements are in an XML Namespace, so the root element has an xmlns attribute to set the current namespace. Also, the root element of a DocBook document must have a version that specifies the version of the format that the document is built on.
A book element must contain a title, or an info element containing a title. This must be before any child structural elements. Following the title are the structural children, in this case, two chapter elements. Each of these must have a title. They contain para block elements which can contain free text and other inline elements like the emphasis in the second paragraph of the first chapter.
[edit] DocBook Authoring
Because DocBook is XML, documents can be created and edited with any text editor. Any XML Editor can also be a functional DocBook editor; A number of XML editing suites exist that come with DocBook schemas, including Emacs in nXML mode and XML Copy Editor. There are also WYSIWYG editors like XMLmind or the Oxygen XML Editor's Author mode, which displays the DocBook document with CSS-based visual formatting for the individual elements.
[edit] DocBook Processing
The easiest way to provide a more presentational format for a DocBook document is to use the DocBook XSL stylesheets. These are XSLT stylesheets that transform DocBook documents into a number of formats (HTML, XSL-FO for later conversion into PDF, etc). These stylesheets are intelligent enough to, with the proper parameters being set, generate tables of contents for books, sets, and any other structural element that could need a table of contents.
The DocBook XSL stylesheets are fine for simple documentation. But for more specific typography, the user can write their own XSLT stylesheet or even a full-fledged program to process the DocBook into an appropriate output format.
[edit] History
DocBook began in 1991 as a joint project of HAL Computer Systems and O'Reilly & Associates and eventually spawned its own maintenance organization (the Davenport Group) before moving in 1998 to the SGML Open consortium, which subsequently became OASIS. DocBook is currently maintained by the DocBook Technical Committee at OASIS. (More detail about the history of DocBook is presented in the "What is DocBook?" external link below.)
DocBook is available in both SGML and XML forms, as a DTD. RELAX NG and W3C XML Schema forms of the XML version are available. Starting with DocBook 5, the RELAX NG version is the "normative" form from which the other formats are generated.
DocBook originally started out as an SGML application, but an equivalent XML application was developed and has now replaced the SGML one for most uses. (Starting with version 4 of the SGML DTD, the XML DTD continued with this version numbering scheme.) Initially, a key group of software companies used DocBook since their representatives were involved in its initial design. Eventually, however, DocBook was adopted by the open source community where it has become a standard for creating documentation for many projects, including FreeBSD, KDE, GNOME desktop documentation, the GTK+ API references, the Linux kernel documentation, and the work of the Linux Documentation Project.
DocBook's use outside of the open source community also continues to grow. And a variety of commercial documentation-authoring tools are now shipped with some form of "off the shelf" support for DocBook.
Norman Walsh and the DocBook Project development team maintain the key application for producing output from DocBook source documents: A set of XSL stylesheets (as well as a legacy set of DSSSL stylesheets) that can generate high-quality HTML and print (FO/PDF) output, as well as output in other formats, including RTF, man pages and HTML Help.
Walsh is also the principal author of the book DocBook: The Definitive Guide, the official documentation of DocBook. This book is available online under the GFDL, and also as a print publication.
[edit] Pre DocBook v5.0
The current version of DocBook, 5.0, is fairly recent. Prior versions have been and still are in widespread use, so this section provides an overview of the changes to the older 4.x formats.
Until DocBook 5, DocBook was defined normatively by a Document Type Definition (DTD). Since DocBook was built originally as an application of SGML, the DTD was the only available schema language. DocBook 4.x formats can be SGML or XML, but the XML version does not have its own namespace.
As an outgrowth of being defined by a DTD, DocBook 4.x formats were required to live within the restrictions of being defined by a DTD. The most significant for the language being that an element name uniquely defines its possible contents. That is, an element named info must contain the same information no matter where it is in the DocBook file. As such, there are many kinds of info elements in DocBook 4.x: bookinfo, chapterinfo, etc. Each of them has a slightly different content model, but they do share some of their content model. Additionally, they repeat context information. The book's info element is that because it is a direct child of the book; it does not need to be named specially for a human reader. However, because the format was defined by a DTD, it did have to be named as such.
The root element does not have or need a version, as the version is built into the DTD declaration at the top of a pre-DocBook 5 document.
DocBook 4.x documents are not compatible with DocBook 5, but they can be converted into DocBook 5 documents through the use of an XSLT stylesheet. One is provided as part of the distribution of the DocBook 5 schema and specification package.
[edit] Simplified DocBook
DocBook offers a large number of features that may be overwhelming to a new user. For those who want the convenience of DocBook without a large learning curve, Simplified DocBook was designed. It is a small subset of DocBook designed for single documents such as articles or white papers (i.e., "books" are not supported). The Simplified DocBook DTD is currently at version 1.1. [1]
[edit] References
[edit] Further reading
- Norman Walsh, Leonard Muellner (October 1999). DocBook: The Definitive Guide (1st edition ed.). O'Reilly Associates. ISBN 1-56592-580-7. http://www.docbook.org/.
- Bob Stayton (2005). DocBook XSL: The Complete Guide (3rd edition ed.). Sagehill Enterprises. ISBN 0-9741521-2-9. http://www.sagehill.net/docbookxsl/.
- Joe Brockmeier (2001). DocBook Publishing - A Better Way to Create Professional Documents. Prima Tech's Linux Series. ISBN 0-7615-3331-1.
[edit] See also
- List of document markup languages
- Comparison of document markup languages
- DocBook XSL A group of XSLT stylesheets for transforming DocBook into various viewable formats.
[edit] External links
- DocBook.org - Collection of DocBook information, including a 4.x and 5.0 version of DocBook: The Definitive Guide and all versions of the DocBook schemas/DTDs.
- DocBook Repository at OASIS - Normative home of DocBook schema/DTD.
- DocBook XSL Project page at SourceForge.net
- DocBook to OpenDocument XSLT (docbook2odf)- DocBook XSLT stylesheets and utils to OpenDocument transformation.
- DocBook Demystification HOWTO
|