Native
XML vs. XML-enabled: The difference makes a difference
by Michael Champion
Senior R&D Advisor, Software AG
About the Author: Michael Champion is a member of the
W3C's Document Object Model Working Group and co-editor of the core XML portion of the DOM
Level 1 recommendation. Champion is currently a senior R&D advisor for new
technologies at Software AG.
The market is currently laden with so-called
"XML-enabled" products that support XML as an input/output
format. While these products clearly have many advantages over others
without XML support, another class of products referred to as
"native XML" offer significant additional advantages. These
products, which support XML down to their internal architectures, are
more scalable, reliable, and even more truly interoperable than those
that merely use XML as a data exchange format.
XML-Enabled Products
Many products currently support XML as an input/output format, that
is, they can translate back and forth between their internal data
formats and APIs and those of XML. Such "XML-enabled"
products clearly have many advantages over their competitors that do
not support XML: they can more easily exchange data with other
products running on other platforms, and they can be programmed to
some extent by means of code written to XML-related specifications.
This has spawned a widespread use of XML as "glue" to
connect existing enterprise systems with others within a single
company, with those of suppliers and customers, and to present live
data to consumers over the Web. A clear example of this sort of use
case for XML is SOAP, an XML-based object serialization format that
can be used to perform asynchronous messaging and remote procedure
calls between non-XML applications using the Internet infrastructure.
Native XML Products
Another class of products, however, support XML deeply in their
internal architectures. Such "native XML" products offer
significant additional advantages over those that are merely XML-enabled. Many of these advantages boil down to scalability
as the volume and complexity of e-business transactions increases,
overhead needed to convert back and forth between XML and some other
data representation will seriously affect the speed, reliability, and
functionality of "XML-enabled" systems. Native XML systems,
which deliver not only the appearance but the reality of an XML
architecture, will run faster, more reliably, and with less
administration. Lets consider some examples.
Database Management Systems
Perhaps the clearest way to illustrate this is to compare
architectures that provide an XML view of an underlying relational
database with those that store and index data in a native XML internal
format.
First, lets define "Native XML Database" as one whose
internal data structures map directly onto the hierarchical format of
XML; users of a native XML database would not be encouraged to
distinguish between some external "interchange" format and
an internal "efficient" format, nor to design applications
that distinguish "business data" from "document
content". In a native XML database, such distinctions are
meaningless.
Most RDBMS vendors now or will soon provide interfaces and
utilities to allow XML data to be stored in their systems with
relatively little obvious pain to the developer. But consider the
mismatches between XML data and normalized RDBMS storage that these
interfaces must paper over:
XML
- Nested hierarchies of elements
- Elements are ordered
- A formal schema is not necessary
- Ordinary business documents can be represented, stored, and
retrieved as a single object
- The XPath standard provides a common (if limited) query language
for locating data.
RDBMS
- Data arranged in rows and columns, with atomic cell values, and
multiple tables JOINed together must be defined to represent
hierarchical relationships.
- Row ordering is not defined
- A predefined schema is usually necessary to describe the
structure of the data
- JOINS of several tables are usually necessary to retrieve even
simple business documents
- Queries are done with SQL retrofitted with proprietary XML
enhancements.
There is no doubt that the major database vendors have worked hard
and cleverly to mask the immediate pain once required to store XML in
an RDBMS. These interfaces, however, cause pains of their own as more
complex XML documents and messages are stored and as the transaction
volume increases. The complexity of the underlying tables, separate
full-text databases, and number of JOINs may be hidden from the
developer, but will be a constant burden on the DBAs and system
administrators responsible for a large-scale system. Similarly, as the
XML view of data and the standards that support it become more widely
understood, end-users will employ more sophisticated queries that will
be easy to express in XPath and future XML query languages, but
difficult to decompose into some combination of SQL and full-text
queries.
Document Authoring Systems
A native XML text handling system that is truly built on
implementations of standard formats, APIs, and protocols will tend to
be easier to use and integrate than one that implements XML interfaces
via translation. For example, contrast using a native XML authoring
tool such as SoftQuad XMetaL versus using an ordinary wordprocessor to
author content then converting it to XML format for storage and
interchange. Again, many vendors have devised clever techniques for
minimizing developers immediate pain by translating MS Word and/or
RTF data produced by conventional wordprocessing systems into XML, and
this does indeed XML-enable these products in a way that can be quite
useful. But here again native XML tools will have capabilities that
will make them much more useful as the volume and complexity of data
increases. XML authoring tools allow the author to be aware of the
underlying distinctions in the XML markup that have no obvious
equivalent in an ordinary wordprocessor. These distinctions may be
crucial, for example, in identifying the "essence" of a
document that is to be preserved when it is translated to a format
suitable for viewing on PDAs or mobile phones. Techniques that allow
such content to be identified in MS Word are often fragile and
"break" when new authors are hired, as documents are
"round-tripped" between the authoring and storage
environment, etc.
Other Application Areas
As industry standards are developed and vendors refine their native
XML products, similar patterns will be seen in other areas,
especially:
- Display and entry of ordinary business data presented in forms
- Routing and transformation of e-business messages (B2B, EAI,
etc.)
- Development of workflows, scripts, and software objects that
automate the actual handling of data
- Extraction and cataloging of "metadata" describing the
semantics of the information embedded in documents and messages.
Conclusion XML as "Steel" vs XML as "Glue"
XML will offer developers in the near future some very significant
advantages when it is deeply embedded in the infrastructure of
enterprise systems and not just used as a kind of industrial-strength
glue to connect them together. Native XML systems will tend to be
easier to develop while being more scalable, reliable, and even more
truly interoperable than those that merely use XML as a data exchange
format.
|