|By Paul Sandoz, October 25, 2005|
|-||What Is Fast Infoset?|
|-||Fast Infoset Adoption|
|-||Fast Infoset and the SOA Approach|
|-||Fast Infoset and Performance|
|-||For More Information|
Fast Infoset (FI) is an open, standards-based binary format for the efficient interchange of XML that is based on the XML Information Set (Infoset). In general, Fast Infoset can be used when it is necessary to retain the XML property of self-description (or the structure), and yet boost parsing speed and reduce document size.
The Fast Infoset specification, ITU-T Rec. X.891 | ISO/IEC 24824-1, is jointly specified by the International Telecoms Union (ITU-T) and the International Organization for Standardization (ISO). Fast Infoset is an approved ITU-T Recommendation and, as of this writing, is at the stage of Final Committee Draft in ISO.
The following sections summarize the Fast Infoset technical details. For a more detailed description, see Fast Infoset @ Java.net, presented at XTech 2005.
The XML Information Set specifies the result of parsing an XML document, referred to as an XML infoset, and a glossary of terms to identify infoset components, referred to as information items and properties. An XML infoset is an abstract model of the information stored in an XML document; it establishes a separation between data and information in a way that suits most common uses of XML. In fact, several of the concrete XML data models are defined by referring to XML infoset items and their properties. For example, SOAP Version 1.2 makes use of this abstraction to define the information in a SOAP message without ever referring to XML 1.X, and the SOAP HTTP binding specifically allows for alternative media types that "provide for at least the transfer of the SOAP XML Infoset."
An XML infoset serialized according to the Fast Infoset specification is referred to as a fast infoset document. Fast infoset documents always retain the hierarchical structure described by the corresponding XML infoset, and depending on which features are selected, can be self-contained or not. Fast infoset documents that are self-contained can be converted to and from XML without information loss (a round trip, so to speak), at least with respect to the information items and properties defined in the XML Information Set.
The Fast Infoset format has been designed to optimize the axes of compression, serialization and parsing, while retaining the properties of self-description and simplicity. The approach has been to define, by default, a sweet spot where moderate compression can be achieved but not at the expense of creation, processing performance, simplicity, and extensiblity.
Fast Infoset is extensible such that the sweet spot can be tuned according to more specific optimization requirements. Different domains have different optimization requirements and domain-specific knowledge can often be used to improve optimization. In addition, some domains favor compression over processing performance, and others require more efficient compression but not at the expense of processing. For more details, see the Fast Infoset paper mentioned earlier.
The Web3D consortium has adopted Fast Infoset as the base encoding for binary X3D documents. Specific algorithms to efficiently encode and process 3D geometric data are used with Fast Infoset to produce X3D fast infoset documents that are smaller than equivalent XML documents (compressed using redundency-based LZH algorithms) but are still processed quickly.
A member of the Web3D consortium is contributing to the development of the Fast Infoset open source project. In addition, the Fast Infoset implementation has been integrated into the Java-based X3D toolkit and X3D browser.
Pragmatic SOA mandates the reuse—not replacement—of existing implementations, standards, and APIs for minimal disruption to a company's existing IT infrastructure. Fittingly, Fast Infoset causes minimal disruption to existing standards and implementations. No changes or extensions are required to XML 1.0/1.1, SOAP 1.1/1.2, Web Services Description Language 1.1/2.0, and W3C XML Schema. The integration of the Fast Infoset open source implementation into JAX-RPC 1.1.3 as part of the JWSDP 1.6 required no changes to the existing Java-based web service APIs.
Web service developers can develop part of their SOA application using WSDL and JAX-RPC and avail of Fast Infoset without requiring any changes to the service definition and application code. Existing applications developed on previous versions of JWSDP can also use Fast Infoset if the SOA infrastructure is upgraded to use JAX-RPC 1.1.3.
Of important note is the minimal disruption to web services security. Secured (signed and encrypted) SOAP messages encoded as fast infoset documents can be verified using the existing XML-based security algorithms. More specifically, the use of Fast Infoset will not affect the verification of a signature produced by digesting the result of a standard canonical XML algorithm.
Pragmatic SOA identifies loose coupling between clients and services as an important property for greater interoperability, allowing for more agility in response to changing requirements.Using Fast Infoset for web services does not mandate a tight coupling between clients and services, such that Fast Infoset-enabled clients are forced to communicate only with Fast Infoset-enabled services and vice versa.
Web clients and services using JWSDP 1.6 can always interoperate using XML; furtermore, they can interoperate using Fast Infoset when Fast Infoset-enabled clients detect that services are Fast Infoset-enabled. HTTP's agent-driven content negotiation feature enables such dual interoperation without tight coupling. (This style of agent-driven content negotiation is referred to as pessimistic.) A Fast Infoset-enabled client assumes that a service is not Fast Infoset-enabled on the first request. If the service is Fast Infoset-enabled, however, it will respond using Fast Infoset. Subsequent requests by the client can then be sent using Fast Infoset.
All services deployed using JWSDP 1.6 are Fast Infoset-enabled. By default, all clients deployed using Fast Infoset are not Fast Infoset-enabled, and the setting of a single Java system enables Fast Infoset.
When developing a pragmatic SOA, services are specified according to business semantics. Services are referred to as nouns—for example, purchase order service—rather than verbs—for example, add customer to purchase order service. Such a design tends to result in coarse-grained messages rather than fine-grained messages. The result is that fewer messages are communicated but the messages are larger.
Such coarse-grained messages contain redundant information. Fast Infoset is ideally suited to reducing the sizes of such messages without adding to the cost of production, while improving the cost of processing. Note that the use of redundancy-based compression, for example GZIP, will often produce smaller messages than equivalent Fast Infoset messages. However, using GZIP adds further processing costs to produce and process compressed messages.
Document-based messaging is an important enabler for loosely-coupled services. Such a messaging style usually requires that messages are described using a schema definition language with the definition possibly originating from a standard body, for example, UBL specified at OASIS. Document-based messaging can also be performed using the REST architectural style or using SOAP-based web services.
Fast Infoset does not specify any constraints on such document-based messaging. Since Fast Infoset specifies an encoding, it may be used by either a REST-based or a SOAP-based web service.
Document-based messages for pragmatic SOA services may often be sent and received asynchronously. Such an approach allows clients and services to be decoupled from the request/response message pattern.
The current loose coupling for the use of Fast Infoset as implemented in JWSDP 1.6 requires the use of the HTTP transport and the request/response message pattern. This message pattern is not suitable for asynchronous communication using HTTP or for other transports like the Java Message Service. While this does not imply that Fast Infoset is not suitable for asynchronous messaging, it does imply that if Fast Infoset and XML are to be used in a loosely coupled manner, then additional mechanisms are required for better asynchronous support. Such mechanisms may be possible using a web services policy or metadata exchange solution where the client queries the capabilities of the service before sending an asynchronous request.
The following results compare Fast Infoset's size, parsing, and serializing performance to XML for UBLdocuments. The UBL documents are publicly available from and were not modified (for example, the documents contain white space tabs for indentation).
The results cover three forms of encoding of UBL documents: Fast Infoset (labeled FI), Fast Infoset using an external vocabulary (labeled FI ExtVoc), and XML. For the size results, GZIP ( GZIP is appended to the label of an encoding) is also included for each encoding. (An encoding is compressed according to the GZIP compression algorithm using default compression parameters.) All results are presented as percentages of the XML results.
Results were obtained using the Java 2 Platform, Standard Edition (J2SE platform) 5.0 software with Solaris 11 x86 installed on an Acer Ferrari 3400. The Virtual Machine for the Java platform (Java Virtual Machine or JVM tool interface), using the -server option, was selected. The Java-based Apache Xerces 2.7.1 XML library was employed for the parsing and serializing of XML UBL documents.
The external vocabulary of a UBL document was calculated from the UBL document itself. This is not the normal practice for the generation of such a vocabulary. For testing purposes, it represents the most optimal form of encoding. In reality, an external vocabulary would be generated from other information like W3C XML Schema and/or a large set of instances where the common repeating strings are assigned the smallest indexes. In this respect, the optimized non-practical external vocabulary is not expected to be much smaller than the one used in practice.
Figure 1 presents results for the arithmetic average size of the three types of encoding and GZIP compression of each encoding for the UBL documents (smaller is better).
Without employing advanced features, fast infoset UBL documents are on average 40% of the size of XML UBL documents. When an external vocabulary is used, fast infoset UBL documents are on average 25% of the size of XML UBL documents. Fast infoset GZIP UBL documents and XML GZIP UBL documents are comparable in size; both are about 20% of XML. This make sense, since GZIP removes additional redundant information from the fast infoset UBL documents. Interestingly, GZIP further compresses fast infoset documents with external vocabularies such that on average they are 14% of XML.
Figure 2 presents results for the arithmetic average time (relative to XML) for parsing the three types of encoding (smaller is better).
On average, the fast infoset UBL documents are parsed nearly 4.5 times faster than XML UBL documents. When an external vocabulary is used, parsing get even faster—7.6 times faster than XML.
Figure 3 presents results for the arithmetic average time (relative to XML) for serializing the three types of encoding (smaller is better). Each encoding was generated from a DOM Document representation of the XML infoset (the Apache Xerces XMLSerializer class was used for XML).
On average, the serializing of Fast Infoset UBL documents is 25% faster than the serializing of XML UBL documents. When an external vocabulary is used, serializing gets faster—47% faster than XML.
The results show that largest gains for Fast Infoset are in the compression and parsing of UBL documents. For these results, XML serialization is faster than XML parsing. The opposite is the case for fast infoset documents. More work is performed per byte of data written for Fast Infoset than XML, but fewer bytes are written by the former— indexing requires a search operation using a hash table.
As a general rule of thumb, it is often the case that XML serialization is faster than XML parsing. When binding applications are used, such as JAXB, it is possible to reduce the amount of work per byte for Fast Infoset by accounting for the schema-knowledge: indexing can be performed in O(1) time, UTF-8 encoding of strings can be pre-generated. Thus, there is potential to improve on Fast Infoset serialization when binding. Such techniques are being investigated at this time.
The following results present the average round trip latency of SOAP messages encoded in Fast Infoset and XML for a set of simple Web services. All results are presented as percentages of the XML result.
Results were obtained using JWSDP 1.6 and J2SE 5.0 software with Solaris 9 on a Sun Blade 2000. The Tomcat 5 web server was employed and the client and server were run on the same machine.
Figure 4 presents results for the average round trip latency (relative to XML) for Fast Infoset and XML (smaller is better).
On average, Fast Infoset is 61% faster than XML.
This article has presented a brief overview of the Fast Infoset specification and how Fast Infoset applies to the Service Oriented Architecture (SOA) model of distributed computing. The properties of Fast Infoset complement the best practices of SOA, specifically, loosely-coupled, document-based messaging. Fast Infoset is a viable alternative to XML when the size of messages and parsing performance is an issue.