XML Products

What is EXI?

EXI/

EXI is a very compact representation of XML specified in the W3C Recommendation Efficient XML Interchange (EXI) Format 1.0 (Second Edition). EXI improves serialization and parsing speed and allows more efficient use of memory and battery life, compared to standard (textual) XML. An EXI stream is typically many times smaller than an equivalent XML document and requires less CPU time to be read or written.

There are two main ways in which EXI can encode an XML document - the schemaless mode and the schema-informed mode. In the schemaless mode, EXI can encode any XML document whether or not a schema is available to the encoder. In the schema-informed mode, EXI has the unique ability to utilize information extracted from an XML schema to increase the efficiency of the encodings without requiring, in general, strict adherence of the data to the schema. However, the EXI encodings can be even more efficient if the user is sure that the data will be valid according to the schema.

The use of schema information makes the EXI encodings more efficient because it allows the EXI processor, at any point within the EXI stream, to make certain predictions about the next item in the stream. For example, if the schema specifies that an element "A" (in a certain context) must always be followed by an element "B", then an occurrence of element "B" when the previous element was "A" gets encoded in zero bits (in the strict mode).

In the schemaless mode, during an encoding or decoding operation the EXI processor continually modifies the way to encode each item based on the actual content of the document encountered so far. For example, when the EXI encoder encounters an element "C" in the content of an element "P", it assumes that an element named "C" has a higher probability of occurrence than elements with other names when the current parent is an element named "P", and creates an abbreviated way to encode the occurrence of an element named "C" under an element named "P". The next time an element named "C" is encountered under an element named "P" (either the same or a subsequent element with the same name), the EXI encoder will be able to use the abbreviated encoding for "C" and thus save space.

In summary, a user of EXI can choose between three main options: (a) not using a schema at all (schemaless), (b) using a schema in a manner that only supports valid XML document (schema-informed, strict), and (c) using a schema in a manner that supports deviations from the schema (schema-informed, non-strict). The schema-informed, strict mode is the most efficient of the three. The schemaless mode is the easiest to use because it doesn't involve a schema.

EXI, like many other XML compression technologies, uses string tables to temporarily store certain kinds of strings that occur in the XML document being encoded, such as namespace URIs, local names, attribute values, and so on, to allow subsequent occurrences of the same string to be encoded using a short string identifier. In the schemaless mode, all the string tables are reset at the beginning of an encoding or decoding operation. In the schema-informed mode, the string tables containing namespace URIs and local names are prepopulated with strings taken from the schema or defined in the XML Schema Recommendation, so that those strings will be already known at the beginning of each encoding or decoding operation.

There are other options in EXI that affect the content of an EXI stream. Some of those options, called fidelity options, control the EXI processor's ability to include certain types of items in the EXI stream, such as XML comments, processing instructions, and namespace declarations. If the user is not interested in one of such items being preserved in the EXI encoding, they can select an option that will make the EXI encoding more efficient by not having to include that type of items. So, for example, if the user states that namespace declarations and prefixes don't need to be preserved, the EXI stream encoder will give up its ability to encode these things and the resulting EXI stream may be more compact. There is another fidelity option, which controls the preservation of the original string values of attributes and elements with simple types. When this option is not selected, those values are encoded more efficiently (for example, an attribute value of type xsd:integer will be encoded as a binary integer rather than as a string), but it will be impossible for a reader to reconstruct the exact original strings when reading back the EXI stream. In many applications, such loss of information is acceptable, and therefore this option should not be selected.

The last major feature of EXI is the support for byte alignment and compression. The user can choose one of four alignment options: (a) the bit-packed alignment, (b) the byte-aligned alignment, (c) precompression, and (d) compression. Bit-packed and compression are the more compact ones (compression is usually, but not always, more compact than bit-packed). Bit-packed and byte-aligned are the faster ones (byte-aligned may be slightly faster than bit-packed). Both precompression and compression arrange the encoded data within the EXI stream into a particular layout, where all the encoded data items that are likely to be similar are close together. This arrangement increases the effectiveness of a compression algorithm applied to the data. Precompression does not perform any compression per se, as its only purpose is to prepare the EXI stream for an external compression step (outside the EXI processor) to be applied to the EXI stream. Compression goes further and applies the standard DEFLATE algorithm to each chunk of similar encoded data items, to produce the final EXI stream.

For more information about EXI, see the EXI Primer at http://www.w3.org/TR/exi-primer/ or the EXI Recommendation at http://www.w3.org/XML/EXI/.