Open Packaging Convention

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Open Packaging Convention
Developed by Microsoft, Ecma, ISO/IEC
Initial release 2006-12-07; 876 days ago
Latest release ISO/IEC 29500:2008 / 2008-11-17; 165 days ago
Type of format File archive, data compression
Container for Electronic documents
Contained by ZIP
Extended from XML, ZIP
Standard(s) ECMA-376, ISO/IEC 29500
Website ECMA-376,
ISO/IEC 29500:2008

The Open Packaging Conventions (OPC) is a file packaging format created by Microsoft for storing a combination of XML and non-XML files that together form a single entity like an XML Paper Specification (XPS) document in a single compressed file container. This format combines the advantages of leaving the independent file entities embedded in the document intact and resulting in much smaller files compared to normal use of XML.

The OPC is specified in Part 2 of the Office Open XML standards ISO/IEC 29500:2008 and ECMA-376.[1][2]

Contents

[edit] Usage

Both the XML Paper Specification (XPS)[3] and Office Open XML (OOXML) use Open Packaging Conventions (OPC), which provide a profile of the common ZIP format. In addition to XML data and document, files in the ZIP package can include other text and binary files in formats such as PNG, BMP, AVI, PDF, RTF, or even an already packaged ODF file. OPC also defines some naming conventions and an indirection method to allow position independence of binary and XML files in the ZIP archive.

OPC files can be opened using common ZIP utilities. Open source libraries for .NET and Java are available for using Open Packaging Conventions. The Open Packaging Conventions is specified in Part 2 of ECMA-376 (131 pages) but is not dependent on other parts of Office Open XML. The Open Packaging Conventions specification also includes details of the ZIP format since ZIP has not actually been specified by any international standard previously, but has widespread community and developer acceptance.

Microsoft has submitted a draft to the Internet Engineering Task Force for a "pack" URI Scheme (pack://) to be used for URI references to OPC-based packages.[4] It was not published as RFC, because its syntax does not follow the generic URI syntax in RFC 3986, as required by RFC 4395 for the permanent registration of new URI schemes.[citation needed]

[edit] Parts and Relationships

Container structure of Part 2 of the Ecma Office Open XML standard, ECMA-376

An OPC-aware application will use relationships metadata rather than directory names and file names to locate individual files. In OPC terminology, a file is a part. A part also has accompanying metadata, in particular MIME metadata.

An OPC file can contain any arbitrary data, but reserves the ".rel" extension for certain purposes. The locations (/_rels/.rels) and /[Content_Types].xml are the only two reserved locations for parts in files that adhere to Open Packaging Conventions.

_rels 
The root level _rels folder has the relationships for the OPC file as a whole. The _rels folder always contains a part called .rels. This is where the "package relationships" are located. Whenever one opens a file using these conventions, one always starts by going to the _rels/.rels file. All relationship files are represented with XML. If one opens it in a text editor, one will see a bunch of XML that outlines each relationship for that part.
[Content_Types].xml file 
This file describes the content of the ZIP package. It also contains a mapping for file extensions and overrides for specific URIs.
[part].rels 
Each part may have its own relationships. The_rels folders are where one goes to find the relationships for any given part within the package. To find the relationships for a specific part, one looks for the _rels folder that is a sibling of one's part. If the part has relationships, the _rels folder will contain a file that has one's original part name with a .rels appended to it. For example, if the content types part had any relationships, there would be a file called [Content_Types.xml.rels] inside the _rels folder.

[edit] Advantages of using the Open Packaging Conventions

OPC has several advantages like indirection, chunking and relative indirection.[5]

[edit] Indirection

Take the example of a catalog where a logo is repeated 1,000 times. Using an indirection mechanism, if we want to change the logo we only need to change one entry in one file, with no searching involved because we know where to look. This increases maintainability substantially. If you want to change the layout of, say, the ZIP directories where your files are stored, it becomes a trivial matter, because you don't need to know every element that can point to file, they are all in one spot.

[edit] Chunking

It encourages documents to be split into small chunks. This is better for reducing the effect of file corruption. And better for data access: for example, all the style information in one XML part, each separate worksheet or table in their own different parts. This allows faster access and less object creation for clients, and makes it easier for multiple processes to be working on the same document.

Chunking also benefits programmers. Replacing one stylesheet with another becomes a ZIP file operation, not an XML operation. And it reduces the amount of things that a programmer needs to understand, because they can approach the chunks assuming that all the information on a topic is in that chunk: they are spared the mental toil of having to search through a big file with lots of extraneous elements.

[edit] Relative indirection

In the Open Packaging Conventions each file that has reference has its own _rels file with the indirection lists. This makes it easier to cut and paste some information with all its associated resources in some cases, provides name scoping to remove the chance of name clashing between files, and so on.

[edit] References

[edit] External links

Personal tools
Languages