Where does the XML data come from?
by Michael Champion
Senior R&D Advisor, Software AG
About the Author: Michael Champion is a member of the
W3C's Document Object Model Working Group and co-editor of the core XML portion of the DOM
Level 1 recommendation. Champion is currently a senior R&D advisor for new
technologies at Software AG.
XML is currently generating a great deal of interest as the
universal language of electronic business. Much effort and expense has been spent
explaining the benefits of XML technology, but not much attention has been given to
answering practical questions such as "How much data is currently available in XML
and where does it come from?" The XML data that is interesting to you is obviously
dependent on your particular requirements, but it is possible to identify some general
answers and point you to some tools that support the storage of XML.
In brief, there's no shortage of XML data available on the Internet, and there are
lots of ways to convert legacy data to XML relatively easily. The amount of data and
number of support tools has increased very noticeably in the past year, and will surely
grow exponentially in the years to come.
In fact, most enterprises will probably soon find themselves overwhelmed by XML data
that may come from all sorts of non-XML sources and generated by "middleware"
components and applications, but have lasting value and will need to be persistently
stored. As this scenario unfolds, many organizations will find it necessary to have a
scalable, reliable database such as Software AGs Tamino,
which uses XML and Internet standards to store, retrieve, and query all this data.
Note that the companies and products noted here are intended to be representative of
what is possible today, and not by any means an exhaustive list of what is available.
XML on the Web or in messages
Over the next year or two, more and more data that you will come across in the
normal course of your business will be in XML format.
- XHTML. This dialect of HTML in well-formed XML syntax is becoming fairly common on the
Internet. For example, http://www.infoworld.com
presents much of its content in XHTML.
Creating XML
The sorts of tools that currently produce proprietary binary formatted data -- such
as word processors, spreadsheets, data entry forms, etc. -- have already begun to be
supplemented by equivalent products that produce XML. The biggest vendors, especially
Microsoft, have shown a clear commitment to accelerate this trend by saving data in XML
format. In the meantime, you can employ products such as:
- XMetaL or other word-processor-like applications
that can be used by ordinary office workers without XML expertise to produce documents in
XML format.
- Tools are available that produce XML data from online forms that ordinary users can
easily fill out. See the offerings from icomXpress and JetForm.
- eNumerate is developing spreadsheet-like
application that will produce XML data in a format that can be displayed in browsers via
XSL and graphed, plotted, etc. by a free browser plug-in.
Exporting XML
As all the companies that have jumped on the XML bandwagon actually implement XML
support in their products, it will be increasingly common to be able to simply export data
from existing tools in XML format.
- MS Office 2000 exports specialized markup data in XML "islands" inside an HTML
data format that is almost well-formed XML.
- ERP and other enterprise-level systems are increasingly supporting XML as an output
format. See http://www.mySAP.com for one prominent
example.
- Software design tools such as Rational Rose are supporting the UMI XML format for exchange of
UML diagrams, rules, etc.
Converting XML
Finally, a number of specialized tools are being designed to easily convert data in
conventional databases and flat files into XML syntax.
- Dave Raggett's famous tidy program
easily converts messy, non-standard HTML such as that found on the Web to well-formed
XHTML.
- upCast, from infinity loop, has both
client-side and server-side tools which convert the RTF format supported by Microsoft and
other word processor vendors into XML, using heuristics to recreate the logical structure
from the layout.
- XMLJunction, from DataJunction, is a visual
design tool for rapidly integrating and transforming data between hundreds of applications
and structured data formats and into XML.
- TextPipe is a Windows GUI
stream editor that works in a similar manner to Unix sed, perl, grep, etc., converting the
data to XML format and optionally generating a DTD to describe the result.
|