deutsche Version
 

 

 

 

 

Native XML vs. XML-enabled: The difference makes a difference

by Michael Champion
Senior R&D Advisor, Software AG

About the Author: Michael Champion is a member of the W3C's Document Object Model Working Group and co-editor of the core XML portion of the DOM Level 1 recommendation. Champion is currently a senior R&D advisor for new technologies at Software AG.

The market is currently laden with so-called "XML-enabled" products that support XML as an input/output format. While these products clearly have many advantages over others without XML support, another class of products referred to as "native XML" offer significant additional advantages. These products, which support XML down to their internal architectures, are more scalable, reliable, and even more truly interoperable than those that merely use XML as a data exchange format.

XML-Enabled Products

Many products currently support XML as an input/output format, that is, they can translate back and forth between their internal data formats and APIs and those of XML. Such "XML-enabled" products clearly have many advantages over their competitors that do not support XML: they can more easily exchange data with other products running on other platforms, and they can be programmed to some extent by means of code written to XML-related specifications. This has spawned a widespread use of XML as "glue" to connect existing enterprise systems with others within a single company, with those of suppliers and customers, and to present live data to consumers over the Web. A clear example of this sort of use case for XML is SOAP, an XML-based object serialization format that can be used to perform asynchronous messaging and remote procedure calls between non-XML applications using the Internet infrastructure.

Native XML Products

Another class of products, however, support XML deeply in their internal architectures. Such "native XML" products offer significant additional advantages over those that are merely XML-enabled. Many of these advantages boil down to scalability – as the volume and complexity of e-business transactions increases, overhead needed to convert back and forth between XML and some other data representation will seriously affect the speed, reliability, and functionality of "XML-enabled" systems. Native XML systems, which deliver not only the appearance but the reality of an XML architecture, will run faster, more reliably, and with less administration. Let’s consider some examples.

Database Management Systems

Perhaps the clearest way to illustrate this is to compare architectures that provide an XML view of an underlying relational database with those that store and index data in a native XML internal format.

First, let’s define "Native XML Database" as one whose internal data structures map directly onto the hierarchical format of XML; users of a native XML database would not be encouraged to distinguish between some external "interchange" format and an internal "efficient" format, nor to design applications that distinguish "business data" from "document content". In a native XML database, such distinctions are meaningless.

Most RDBMS vendors now or will soon provide interfaces and utilities to allow XML data to be stored in their systems with relatively little obvious pain to the developer. But consider the mismatches between XML data and normalized RDBMS storage that these interfaces must paper over:

XML

  • Nested hierarchies of elements
  • Elements are ordered
  • A formal schema is not necessary
  • Ordinary business documents can be represented, stored, and retrieved as a single object
  • The XPath standard provides a common (if limited) query language for locating data.

RDBMS

  • Data arranged in rows and columns, with atomic cell values, and multiple tables JOINed together must be defined to represent hierarchical relationships.
  • Row ordering is not defined
  • A predefined schema is usually necessary to describe the structure of the data
  • JOINS of several tables are usually necessary to retrieve even simple business documents
  • Queries are done with SQL retrofitted with proprietary XML enhancements.

There is no doubt that the major database vendors have worked hard and cleverly to mask the immediate pain once required to store XML in an RDBMS. These interfaces, however, cause pains of their own as more complex XML documents and messages are stored and as the transaction volume increases. The complexity of the underlying tables, separate full-text databases, and number of JOINs may be hidden from the developer, but will be a constant burden on the DBAs and system administrators responsible for a large-scale system. Similarly, as the XML view of data and the standards that support it become more widely understood, end-users will employ more sophisticated queries that will be easy to express in XPath and future XML query languages, but difficult to decompose into some combination of SQL and full-text queries.

Document Authoring Systems

A native XML text handling system that is truly built on implementations of standard formats, APIs, and protocols will tend to be easier to use and integrate than one that implements XML interfaces via translation. For example, contrast using a native XML authoring tool such as SoftQuad XMetaL versus using an ordinary wordprocessor to author content then converting it to XML format for storage and interchange. Again, many vendors have devised clever techniques for minimizing developers’ immediate pain by translating MS Word and/or RTF data produced by conventional wordprocessing systems into XML, and this does indeed XML-enable these products in a way that can be quite useful. But here again native XML tools will have capabilities that will make them much more useful as the volume and complexity of data increases. XML authoring tools allow the author to be aware of the underlying distinctions in the XML markup that have no obvious equivalent in an ordinary wordprocessor. These distinctions may be crucial, for example, in identifying the "essence" of a document that is to be preserved when it is translated to a format suitable for viewing on PDAs or mobile phones. Techniques that allow such content to be identified in MS Word are often fragile and "break" when new authors are hired, as documents are "round-tripped" between the authoring and storage environment, etc.

Other Application Areas

As industry standards are developed and vendors refine their native XML products, similar patterns will be seen in other areas, especially:

  • Display and entry of ordinary business data presented in forms
  • Routing and transformation of e-business messages (B2B, EAI, etc.)
  • Development of workflows, scripts, and software objects that automate the actual handling of data
  • Extraction and cataloging of "metadata" describing the semantics of the information embedded in documents and messages.

Conclusion – XML as "Steel" vs XML as "Glue"

XML will offer developers in the near future some very significant advantages when it is deeply embedded in the infrastructure of enterprise systems and not just used as a kind of industrial-strength glue to connect them together. Native XML systems will tend to be easier to develop while being more scalable, reliable, and even more truly interoperable than those that merely use XML as a data exchange format.