Document Type Definition

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Document Type Definition (DTD) is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. A DTD is primarily used for the expression of a schema via a set of declarations that conform to a particular markup syntax and that describe a class, or type, of document, in terms of constraints on the structure of that document. A DTD may also declare constructs that are not always required to establish document structure, but that may affect the interpretation of some documents. XML documents are described using a subset of DTD which imposes a number of restrictions on the document's structure, as required per the XML standard (XML is in itself an application of SGML optimized for automated parsing). DTDs are written in a formal syntax that explains precisely which elements and entities may appear where in the document and what the elements’ contents and attributes are. DTD is native to the SGML and XML specifications, and since its introduction other specification languages such as XML Schema and RELAX NG have been released with additional functionality.

As an expression of a schema, a DTD specifies, in effect, the syntax of an "application" of SGML or XML, such as the derivative language HTML or XHTML. This syntax is usually a less general form of the syntax of SGML or XML.

In a DTD, the structure of a class of documents is described via element and attribute-list declarations. Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character data may be contained within each element. Attribute-list declarations name the allowable set of attributes for each declared element, including the type of each attribute value, if not an explicit set of valid value(s).

Contents

[edit] Associating DTDs with documents

A DTD is associated with an XML document via a Document Type Declaration, which is a tag that appears near the start of the XML document. The declaration establishes that the document is an instance of the type defined by the referenced DTD.

The declarations in a DTD are divided into an internal subset and an external subset. The declarations in the internal subset are embedded in the Document Type Declaration in the document itself. The declarations in the external subset are located in a separate text file. The external subset may be referenced via a public identifier and/or a system identifier. Programs for reading documents may not be required to read the external subset.

[edit] Examples

Here is an example of a Document Type Declaration containing both public and system identifiers:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE foo [ <!ENTITY greeting "helloworld"> ]>
 
<!DOCTYPE bar [ <!ENTITY greeting "helloworld"> ]>

All HTML 4.01 documents are expected to conform to one of three SGML DTDs. The public identifiers of these DTDs are constant and are as follows:

The system identifiers of these DTDs, if present in the Document Type Declaration, will be URI references. System identifiers can vary, but are expected to point to a specific set of declarations in a resolvable location. SGML allows for public identifiers to be mapped to system identifiers in catalogs that are optionally made available to the URI resolvers used by document parsing software.

[edit] Markup Declarations

In a DTD markup declarations are used to declare which elements types, attribute lists, entities and notations are allowed in the structure of the corresponding class of XML documents.[1]

[edit] Element Type Declarations

An Element Type Declaration defines an element and its possible content. A valid XML document only contains elements that are defined in the DTD.

An element’s content is specified by some key words and characters:

  • EMPTY for no content
  • ANY for any content
  • , for orders
  • | for alternatives ("either...or")
  • ( ) for groups
  • star for any number (zero or more)
  • + for at least once (one or more)
  • ? mark for optional (zero or one)
  • If there is no *, + or ?, the element must occur exactly one time

Examples:

<!ELEMENT html (head, body)>
<!ELEMENT p (#PCDATA | p | ul | dl | table | h1|h2|h3)*>

[edit] Attribute List Declarations

An Attribute List specifies the name, data type and default value of each attribute associated with a given element type,[2] for example:

<!ATTLIST img 
   id     ID       #IMPLIED
   src    CDATA    #REQUIRED
>

There are the following attribute types:

  • CDATA (Character set of data)
  • ID
  • IDREF and IDREFS
  • NMTOKEN and NMTOKENS
  • ENTITY and ENTITIES
  • NOTATION and NOTATIONS
  • Listings and NOTATION-listings

A default value can be used to define whether an attribute must occur (#REQUIRED) or not (#IMPLIED), whether it has a fixed value (#FIXED), and which value should be used as a default value ("…") in case the given attribute is left out in an XML tag.

[edit] Entity Declarations

Entities are variables used to define abbreviations; a typical use is user-readable names for special characters.[3] Thus, entities help to avoid repetition and make editing easier. In general, there are basically two different types:

  • Internal (Parsed) Entities define entity references in order to replace certain strings by a replacement text. The content of the entity is given in the declaration.
  • External (Parsed) Entities refer to external storage objects.

[edit] Notation Declarations

Notations read the file format of unparsed external documents in order to include non-XML data in a XML document. For example a GIF image:

<!NOTATION GIF system "image/gif">

[edit] XML DTDs and schema validation

The XML DTD syntax is one of several XML schema languages.

A common misconception is that non-validating XML parsers are not required to read DTDs, when in fact, the DTD must still be scanned for correct syntax as well as for declarations of entities and default attributes. A non-validating parser may, however, elect not to read external entities, including the external subset of the DTD. If the XML document depends on declarations found only in external entities, it should assert standalone="no" in its XML declaration.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE people_list SYSTEM "example.dtd">
<people_list>
  <person>
    <name>Gofur Halmurat</name>
    <birthdate>04/02/1977</birthdate>
    <gender>Male</gender>
  </person>
</people_list>

[edit] XML DTD Example

An example of a very simple XML DTD to describe a list of persons is given below:

<!ELEMENT people_list (person*)>
<!ELEMENT person (name, birthdate?, gender?, socialsecuritynumber?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT birthdate (#PCDATA)>
<!ELEMENT gender (#PCDATA)>
<!ELEMENT socialsecuritynumber (#PCDATA)>

Taking this line by line, it says:

  1. people_list is a valid element name, and an instance of such an element contains any number of person elements. The * denotes there can be 0 or more person elements within the people_list element.
  2. person is a valid element name, and an instance of such an element contains one element named name, followed by one named birthdate (optional), then gender (also optional) and socialsecuritynumber (also optional). The ? indicates that an element is optional. The reference to the name element name has no ?, so a person element must contain a name element.
  3. name is a valid element name, and an instance of such an element contains "parsed character data" (#PCDATA).
  4. birthdate is a valid element name, and an instance of such an element contains parsed character data.
  5. gender is a valid element name, and an instance of such an element contains parsed character data.
  6. socialsecuritynumber is a valid element name, and an instance of such an element contains parsed character data.

An example of an XML file which makes use of and conforms to this DTD follows. It assumes the DTD is identifiable by the relative URI reference "example.dtd", and the "people_list" after "!DOCTYPE" tells us that the root tags, or the first element defined in the DTD, is called "people_list":

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE people_list SYSTEM "example.dtd">
<people_list>
  <person>
    <name>Fred Bloggs</name>
    <birthdate>27/11/2008</birthdate>
    <gender>Male</gender>
  </person>
</people_list>

It is possible to render this in an XML-enabled browser (such as IE5 or Mozilla) by pasting and saving the DTD component above to a text file named example.dtd and the XML file to a differently-named text file, and opening the XML file with the browser. The files should both be saved in the same directory. However, many browsers do not check that an XML document conforms to the rules in the DTD; they are only required to check that the DTD is syntactically correct. For security reasons, they may also choose not to read the external DTD.

Other alternatives to DTDs have become available in the last few years:

  • XML Schema, also referred to as XML Schema Definition (XSD), has achieved Recommendation status within the W3C, and is popular for "data oriented" (that is, transactional non-publishing) XML use because of its stronger typing and easier round-tripping to Java declarations. Most of the publishing world has found that the added complexity of XSD would not bring them any particular benefits[citation needed], so DTDs are still far more popular there. An XML Schema Definition is itself an XML document while a DTD is not.
  • RELAX NG, which is also a part of DSDL, is an ISO international standard. It is more expressive than XSD, while providing a simpler syntax, but commercial software support has been slow in coming.
  • Document Structure Description (DSD) was another proposed alternative that, as of 2008, has not seen much progress or adoption in several years.

[edit] See also

[edit] References

  1. ^ http://books.google.de/books?id=_NqW2BjQtFIC&pg=PA43&lpg=PA43&dq=DTD+Markup-Declarations&source=bl&ots=_3P2DUCXGL&sig=K_Slr8Xj3Mpb_V3zBMPR92tSYOM&hl=de&sa=X&oi=book_result&resnum=4&ct=result#PPA44,M1
  2. ^ http://www.stylusstudio.com/w3c/xml11/attdecls.htm#attdecls
  3. ^ http://www.w3schools.com/dtd/dtd_entities.asp

[edit] External links

Personal tools