deutsche Version
 

 

 

 

 

Preparing for the XML Wave

Michael Champion, Senior R&D Advisor,
Software AG

About the Author: Michael Champion is a member of the W3C's Document Object Model Working Group and co-editor of the core XML portion of the DOM Level 1 recommendation. Champion is currently a senior R&D advisor for new technologies at Software AG.

What is XML and Why is the Wave Coming?

The current wave of worldwide interest in the Extensible Markup Language (XML) has come because it is widely touted as a solution for a range of problems that plague those developing enterprise information and e-business applications. For example:

  • Microsoft Word is almost universally used in organizations (at least in the United States!) to exchange documents, but the Word application is only available on the Windows and Macintosh platforms, thus limiting the ability for information in Word documents to be exchanged and re-used.
  • HTML is very useful as a universally understood format for displaying text, but its format is rigidly defined by the World Wide Web Consortium (W3C) and the set of tags it defines cannot be extended without destroying the interoperability of the data across applications and platforms.
  • For "data" (as opposed to "documents", although there is a vast fuzzy area between the two) stored in existing enterprise-level applications, there are numerous problems exchanging the (often binary) information across platforms and between applications. EDI solutions are cumbersome and expensive.
  • Much data is stored in RDBMS systems that mostly use SQL as a common access mechanism, but exchanging data across systems is much less standardized and far more problematic.

XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed with relatively little human intervention and exchanged acros diverse hardware, operating systems, and applications.

  • The biggest advantage is simply that it is a standard. The standard syntax means that you don't need to write your own parser, formatter, or transformer in order to load, display, or manipulate data.Tools, techniques, knowledge, etc. acquired for one XML application or vendor are generally applicable to other vendors. You're not locked in to a single vendor's data formats ... or to a lesser extent, APIs and supporting tools.
  • Since XML is a relatively simple standard to understand, use, and implement, it has become widespread quite quickly. It is pervasive -- virtually every language, computing platform, and even large-scale application has XML tools available for it. It is a classic "80:20" solution, i.e. it supplies about 80% of the functionality of competing technologies (such as MS Word for documents, EDI for data exchange) with perhaps 20% of the effort required to build enterprise-level solutions.
  • The tags make XML data self-describing in a well-designed application. The tags label the meaning of a piece of data, greatly reducing the difficulty of extracting useful information out of a data stream or of using data from one application in another.
  • While it is true that any data schema can be "normalized" to fit into a relational data model (and most can be mangled somehow to work with EDI formats and protocols), this is a job best tackled by experts when data become at all complex.XML allows a much more direct and natural way of representing many common types of data that have an intrinsically hierarchical structure but not rigidly fixed format.

So in brief, XML offers:

  1. A widely adopted standard way of representing text and data ...
  2. In a format that can be processed without much human or machine intelligence ...
  3. Exchanged across platforms, languages, and applications ...
  4. and used with a wide range of development tools and utilities.

Example - Comparing HTML and XML

XML is similar enough to HTML in its actual format (both are closely related to the SGML markup definition language that has been an ISO standard since 1986) so that those familiar with HTML can fairly easily pick up basic XML knowledge. There are two fundamental differences:

  • Separation of form and content -- HTML mostly consists of tags defining the appearance of text; in XML the tags generally define the structure and content of the data, with actual appearance specified by a specific application or an associated stylesheet.
  • XML is extensible -- tags can be defined by individuals or organizations for some specific application, whereas the HTML standard tagset is defined by the World Wide Web Consortium (W3C).

Let's consider a simple example, the title and author information at the top of this document:

<H1 ALIGN="CENTER">Preparing for the XML Wave </H1>
<H3 ALIGN="RIGHT"> Whitepaper to Accompany Presentations in Asia<BR>
October 1999 <BR>
Michael Champion<BR> Software AG<BR> <A HREF="mailto:mike.champion@sagus.com>mike.champion@sagus.com</A><BR><A HREF="http://www.softwareag.com"> http://www.softwareag.com </A>
</H3>

Note how the tags refer to the visual appearance of the information. When rendered by a brower, a human familiar with conventions for papers such as this will recognize this as the header of a formal paper, including the title, subtitle, author name, author's affiliation and contact information. A computer program -- at least one not specifically designed to recognize this format or one having enough "artificial intelligence" to figure this out -- will not be able to process this information further, e.g. to put it in an address book or bibliography database.

Let's see a possible XML encoding of the same data, with tags indicating what each piece of data is rather than how it looks:

<headerinfo><title>Preparing for the XML Wave </title>
<subtitle> Whitepaper to Accompany Presentations in Asia</subtitle>
<date>October 1999 </date>
<author><firstname>Michael</firstname><surname>Champion</surname> <affiliation>Software AG</affiliation> <emailaddr> mike.champion@sagus.com</emailaddr>
<homeurl>http://www.softwareag.com </homeurl> </author></headerinfo>

The basic approach of using tags to markup the content remains the same, the actual text inside the tags is almost identical, but the fact that the tags themselves carry meaningful semantic and structural information means that ordinary computer programs can "understand" the data far more easily than is possible with unformatted ASCII text, proprietary wordprocessor formats, or even HTML.

How is XML Being Used Today?

Document Content Management

Content management systems typically store individual document components, provide access control and revision control tools to ensure that the information in each component is correct and up-to-date, then provide tools to assemble and publish the components into complete documents. Likewise, content management systems often allow live data from ordinary databases and graphics repositories to be incorporated into the text documents, and often simplify the complexities of storing XML text in an underlying relational database. SGML, and more recently XML, provides ways of structuring textual and graphic document components, ensuring that a particular document maintains a pre-defined structure, and of re-using bits of "boilerplate". Not surprisingly, many early applications of XML have been in content management systems such as Chrystal Software's Astoria, Inso's Dynabase, Texcel Information Manager, and Progressive's SIM.

SGML/XML has proved to be of great value in allowing authors to write a single source document (or set of components) that can be variously published on paper, on CD-ROM, and on the Internet. This is generally done by defining a different "stylesheet", or set of rules defining how various XML elements are to be displayed, for each output format. More recently, XML transformation systems such as XSLT have allowed this to be taken another step, so that the same content may be displayed on ordinary HTML browsers, XML browsers such as Netscape Mozilla and Microsoft IE 5.x, and even lightweight devices such as cellphones and PDAs that support the "wireless markup language" (WML).

Glue, Wire and Duct Tape

As a recent article in InfoWorld put it quite succinctly, "(XML) is emerging as a kind of industrial-strength duct tape to fix cracks and fissures throughout an enterprise's application foundation." Much of the current excitement surrounding XML involves its potential in linking existing enterprise systems -- often ERP systems from SAP, Baan, Peoplesoft, etc. -- with:

  • Each other -- often organizational changes, mergers, and the general evolution of technology and business force enterprise-level systems that were designed to stand alone to exchange data with other "monolithic" systems. The ERP vendors themselves, and such enterprise integration vendors as WebMethods have used XML as the cornerstone for these integration efforts.
  • Suppliers and Customers -- This has been traditionally the domain of "Electronic Data Interchange", but EDI has useage has been relatively limited outside of very large businesses because of its cost and complexity. XML-based EDI has come into its own in the last year or so because the universal transport mechanism provided by the Internet has become linked with the universal data format provided by XML. See the xmledi.org Web site for much more information.
  • Consumers -- The explosion of interest in e-commerce has made it imperative for many enterprises to offer their goods or services for sale from Web sites accessible to anyone with a browser. There are many products allowing one to build an online storefront, but few offer links to underlying ERP systems, and the ERP vendors have not been quick to offer sophisticated storefront software of their own. This has led to a paradoxical situation where orders entered electronically by consumers over the internet often must be re-entered by hand into the ERP systems that track inventory, schedule production, etc.! Vendors such as Intershop have used XML as the basis for data presented to and received from the user and to link with back-end EDI and ERP systems.

One additional way in which XML can serve as "glue" deserves mention. Various initiatives, especially XML-RPC and SOAP, have defined means of performing remote procedure calls between a client application and a server application using HTTP as the transport mechanism and XML to encode the details of the function names, arguments, returned values, etc.

Portals

"Enterprise Information Portals" provide coherent, personalized views of disparate data in an enterprise, delivered over an intranet or the internet to anyone with a browser. Portals can help classify and focus information to support specific internal business objectives (such as workgroup productivity) or to provide information services to customers. XML is central to many portal products (for example DataChannel RIO) because it is rich enough to represent data from a number of sources yet flexible enough to be formatted for display in browsers. This allows the portal itself to be a relatively thin application that relies on underlying databases, ERP systems, search engines, and the Web itself, for the underlying content and functionality.

Some Case Studies

XML is a fairly new technology, and to be perfectly frank most large projects with which I'm familiar are in the conceptualization or development phase. There are a number of vendors who offer pointers to case studies involving their own customers, including:

Some of the more interesting "state of the art" projects that are too early in their evolution to fairly evaluate, but illustrate the kinds of things that large organizations are trying to do with XML, include:

  • mySAP.com - An ambitious collaboration between SAP and WebMethods to expose ERP data merged with news, etc. in a portal-like way.
  • The HL7 initiative - Healthcare data management may be the "killer app" for XML; health records are partly structured but highly variable, made up of both text and data, and issues such of privacy, security, accessibility, and integrity make this an extremely challenging arena.
  • The WAP/WML consortium - a collaboration among the major cellphone vendors and enterprise-level system providers to define lightweight counterparts to HTTP and HTML for extremely "thin" clients such as PDAs and cellphones.

There is one pioneering XML project that has been very successful, but has received relatively little attention from the computer trade press -- the Wall Street Journal Interactive Edition. (The description here is adapted from an Inter@ctive Week article describing the project, and an online transcript of a conference presentation by the manager of the project.)

The WSJ developers began by making a critical distinction, central to the XML "dogma", but usually ignored in the rush to produce attractive Web pages under great time pressure: There is a great difference between the content that one authors and the content that one delivers.

"On the document that you author, the important thing to remember is that authoring side documents should be optimized for the editor. The editor should be free of the details of navigation, advertising, and other resources. Their view of the document is often different from the final view. .... One is for the author's benefit, and one for the customer's benefit. "

Microsoft Word is used for the basic text entry and editing macros and small utility programs written in Microsoft's Basic scripting language for Word help ensure that the text matches the "DJML" DTD. Documents created in Word are saved in the Rich Text Format (RTF). The Konstructor Suite from OmniMark Technologies Corp. converts the RTF files into XML text for storage.

XML also allows intelligent searching within the Interactive Edition archive. Searching based on XML tags provides better information by treating the tagged data separately from other data. For example, searching for the company name tag of "IBM" will return stories about IBM, but not stories in which IBM is mentioned in passing.

The power of XML was demonstrated when The Wall Street Journal subsequently arranged to have its news presented to PalmPilot users via the AvantGo format. Subscribers download information formatted for the limited Web browser abilities in these handheld devices to their PCs, then sync them to their palm computers for later reading.

In short, the WSJ Interactive Edition successfully uses XML technologies to separate form from content so that the content can be more efficiently processed by content management tools, uses XML as both text and data, and "repurposes" the XML data for the printed paper, the online HTML edition, and stripped-down HTML delivered to PDAs.

Criteria for Using XML in a Project Today

To summarize, with the standards and tools available now, XML can be a very appropriate technology to consider for a project when some of the following conditions apply:

  1. Data must be exchanged across diverse systems ...
  2. The imported data must be processed with minimal human involvement ...
  3. Data must be presented in multiple formats or on a variety of media ...
  4. One can leverage existing products or specialized tools to ease the development.

Challenges Facing XML Developers

Hype Overkill

Ironically, one of the XML community's biggest challenges is in appropriately addressing the inflated expectations caused by the rapid growth in its popularity. One sees (in forums such as the XML Developers Mailing List ) a certain amount of disillusionment when people realize that adopting XML is no substitute for careful analysis and design in ensuring the success of a project. For example, it's easy to mistake the excitement caused by XML making it feasible to translate data from various existing systems running on different platforms into a form that is accessible to all via browsers ... for an assertion that it is easy. If the various underlying systems have radically different semantics for the data, or the data schemas were poorly designed in the first place, or the applications were so poorly written that it is nearly impossible to translate the outputs into XML, then it's not likely that XML will be the "silver bullet" that turns a monstrosity into an valuable system. XML can be valuable in presenting a common interface to a number of disparate but rationally designed systems, but it will not in and of itself impose order on chaos.

Maintaining Interoperability

Another way in which XML is a victim of its own success is the growing number of XML-related standards and standards organizations addressing very similar problems with rather different solutions. For example, BizTalk, xml.org, cXML, and RosettaNet (along with others) are involved with defining XML schemas for common business documents such as catalogs, invoices and purchase orders. Their membership overlaps, their missions are somewhat distinct, the types of products vary ... but it is not clear at this time whether this diversity is good or bad for the average e-business developer. Furthermore, it is easy to be cynical and interpret much of this as attempts by different vendors and organizations to "own the architecture" by being the first to define formats and protocols that become de-facto standards. (The more established standards organizations such as ISO, the W3C, and the IETF tend to not get involved in defining such industry-specific standards that are built on top of XML and/or HTTP).

As with everything, the challenge is for each developer to determine -- for the application in question -- which "standard" provides the best base to build from and the best support for interoperating with the others. Building on top of a commonly used format such as the BizTalk schemas does open up the possibility of using a range of emerging tools that have bought into the "standard". But the whole purpose of XML is defeated if people adopt the mindset (all too common among ERP system customers) that if the tool does not fit the business need, change the business need to fit the tool! In other words, XML is designed to be extensible and there are a large number of transformation tools becoming available to facilitate conversion from one XML data format to another, so don't get locked in to something that does not fit your needs.

Persistent Storage of XML

E-business applications currently typically use XML that is stored in either filesystems or relational database systems. Filesystems have obvious limitations for serious applications -- the same ones that led to the widespread acceptance of RDBMs systems 10-15 years ago, including the lack of good search tools, data integrity maintenance, transaction processing, etc. Relational databases have these, but have disadvantages when it comes to storing XML.

While relational databases create the context to the data through tables, columns, joins etc., they work best with data that fits to this structure. As soon as the data has left the database, its meaning relies totally on the further processing applications. In complex environments this often leads to problems which are hard to fix like unexpected application behavior, lack of scalability and maintainability.

It takes a considerable amount of effort for a developer to work around the problems of storing XML in a relational database in a way that is suitable for a real e-business. The basic schema describing the hierarchical XML structure must be normalized into a form storable in tables, but not "excessively" normalized to the point where performance becomes unacceptable. Searchable data must be defined and (probably) stored separately as metadata in the RDBMS. Queries on the XML structure must be translated into SQL statements that access the underlying tables or full-text search engines. This is generally a task requiring at least the skills of professional programmers, and often those with specialized expertise in the underlying DBMS.

How Does Software AG Help XML Developers Overcome These Challenges?

Software AG has responded to the limitations of RDBMS storage of XML data with the concept of an information server – an efficient and scalable architecture for information staging, integration and exchange. An information server is not intended to replace existing data storage concepts within the enterprise. It acts as an information hub which administrates all kind of data in the enterprise. Information can either be stored directly or just processed through the information server. In this case the information server stores the remote location of this information or how to access it. The second important task of an information server is to control the information flow to and from the company. Due to the wide acceptance of XML as the proposed information standard in the Internet, the Intranet and between applications within the enterprise, information servers have to be able to act as gateways, that centralize the control over the access and the flow of information between those architectures.

Software AG has recently released its XML Information Server, known as Tamino.

Based on a small and extremely fast kernel technology that is able to process XML natively, the so called X-Machine technology, Tamino is the first database that allows direct storage, integration and exchange of XML-data. This guarantees high performance and scalability since no extra layer for data conversation to and from XML is needed. In other words, there is no mapping layer between the XML you see and the underlying database structures. This eliminates having to do an analysis of which XML elements are to be stored in an efficiently searchable manner and which are to be stored as something like BLOBs.

Furthermore unexpected changes in the format of a data stream, which is a key feature of XML, can be processed based on the embedded meta data. Tamino accepts XML objects as input and offers XML objects as output. Central interactive administration of multiple local and remote databases is provided and can be carried out from multiple locations via a GUI that runs in standard Web-browsers.

To protect investments made in the past, Tamino also provides integrated access to existing legacy data residing in external data sources (e.g. Relational DBMS or data created by Office applications). Tamino is an integration platform for describing, managing and integrating information within and from outside the enterprise without loss in security, availability or performance. Tamino is totally based on standards for seamless integration into existing IT environments.

Finally, Software AG has been developing industrial strength database systems for something like 30 years now. Tamino has been built upon the knowledge acquired from developing the Adabas system, which is widely used in environments where absolute reliability, near-infinite scalability, TRUE 24:7 availability, etc. are requirements.

Based on Software AG proven technology, Tamino will be the fastest and most reliable XML-database for Electronic Business.

XML and the Future of E-Business - From Duct-tape to Foundation Stone

Perhaps a majority of interesting business applications will be distributed across platforms and servers and accessible via the internet. These will be accessible via different clients, from “heavy” specialized apps, Web browsers, and “light” PDAs and cellphones.

XML has the potential to become the leading software development platform for such applications

For example, Forrester Research has written that “The introduction of Visual Basic 1.0 catapulted Windows into the mainstream... A tool that allows simple business connections to be crafted through scripting and document authoring ...will win big.” PC Week notes that “Using an application server, some scripts and XML, a company with capable in-house developers could build a system that approximated the functionality of even the highest- end [Web content management] systems."

Using XML in the System Architecture

XML will offer developers in the near future some very significant advantages when it is deeply embedded in the core of an application, not just used as"a kind of industrial-strength duct tape to fix cracks and fissures". These include:

Allowing naturally hierarchical data to be represented directly

Many "naturally occurring" data fit much more easily into a hierarchical XML schema than a set of RDBMS tables.

Providing flexibility for handling free-format data

Data that don't match a predefined schema do not store easily, at least if their values are to be queryable, in RDBMS and EDI systems; this is much easier with XML.

Supporting XML-aware tools directly

Tools that support the exchange of native XML data (or API standards such as DOM and SAX) need relatively little customization to be made to work in a new application, or with other native tools.  

 

Native XML Tools Will Make XML Development More Accessible to non-Experts

 

Text authoring

Even though XML data interchange gets more attention these days, it is still important to easily and accurately enter XML text. Some XML developers have had success using Microsoft Word as an XML text entry tool, and some vendors such as Interleaf and Arbortext support exchange of XML content with Word reasonably well. There are also a large number of free or cheap XML editing tools available, but only three native XML authoring tools that can credibly claim to approach the ease of use of a wordprocessing program:

  • SoftQuad XMetaL - The "mindshare" leader and least expensive of the three here.
  • Arbortext Adept - This was the dominant high-end SGML editor
  • Excosoft Documentor - another high-end tool from a Swedish company that has recently gotten more name recognition outside of its European base.

Business forms

What authoring tools are to text, forms definition and processing tools are to XML data. The majority of the effort of building a traditional business application -- be it in COBOL or Visual Basic -- tends to involve displaying forms, getting data entered into the forms, validating the data, and storing it in a form suitable for later processing. XML tools that make this easy have great potential to play the same role in the development of XML e-business applications that the Dialog Editor has in Visual Basic.

  • UWI.com - Has a suite of XML software that allows organizations to conduct secure, verifiable business-to-business e-commerce transactions on the Internet.
  • JetForm XML Forms Architecture - A form definition language that provides a definition of those elements critical to forms and the processing of forms.
  • Keyfile Keyflow Commerce - Has an XML engine and forms designer that automates work processes over the Web.

Schema and stylesheet wizards

Just as forms technologies assist in the entry of XML data, schemas have the potential to assist in the validation and transformation of XML data. There are currently a number of schema "standards" in common use, from the limited "DTD" construct in the XML 1.0 specification, including a number of proposed formats that have more functionality in various respects, and the draft W3C schema specification. Similarly, the evolving XSL stylesheet and data transformation specification offers much useful functionality, but authoring stylesheets is not easy and converting stylesheets based on interim proposals to the ultimate standard will be a challenge. Tools that make it feasible to use XML now but to migrate to whatever the eventual standards become obviously can be of great practical value.

  • Various IBM alphaWorks XML tools - Programmer tools to parse, transform, analyze XML
  • Extensibility XML Authority - Schema editor and translator
  • Infoteria XSL Style Wizard - Assist production of XSL stylesheets

Scripting / Information flow applications

Finally, there needs to be some way of building actual information processing dataflows that use data in XML messages and documents to make decisions about how it should be routed or processed. Of course this can be done with procedural programming languages such as C++ or Java, but widespread adoption will require user-level scripting and visual programming tools.

  • Many tools support ECMAScript/JavaScript or VBScript
  • WebMethods B2B has a Visual "flow" development tool to build business integration processes.
  • Bluestone Visual XML is Visual tool to build DTDs, documents, and Java classes that store XML in any database

Getting There From Here

Here are a few concluding suggestions:

Stick with the standards

It will often be necessary to work with interim standards (such as one of the schema proposals or the draft XSL stylesheet language today) in order to build applications in a timely manner. Likewise certain vendors encourage customers to use proprietary extensions to their standard tools that offer more functionality, but tend to lock in customers to the current supplier. While compromises are un-avoidable, in the long run one is almost certainly better off by minimizing dependency on non-standard XML formats and tools to maximize interoperability with other applications and maintaining the flexibility to migrate to better tools as they become available.

Use XML for outward facing data --

One of the most compelling reasons for moving quickly toward using XML to describe an application's content then transforming it into some display format (rather than either storing HTML or converting directly from enterprise database data to HTML) is the flexibility it offers in a world where display formats are in such flux. For example, it's not at all clear when the browser on a typical user's desk will support native XML, or whether it will support CSS and/or XSL stylesheets; cellphone browsers will probably support WML, but it's not clear whether PDA browsers (and WinCE-based cellphones, if they become a signficant factor) will support WML or some stripped-down HTML dialect. In any event, keeping data on the server side in XML allows any or all of these options to be supported simply by adding transformation stylesheets.

Stage XML data in a persistent cache --

Not all enterprise data will be converted to XML in the forseeable future, no matter how successful the XML standards groups and vendors are at implementing and promoting the technology. Thus there will always be non-XML data in the e-business "soup" for virtually all developers. It seems clear from this overview that much can be gained by transforming such data to XML for interchange and display purposes. It also makes much sense to store such transformed XML -- as well as XML received from outside or loaded from native XML applications -- in a native XML data cache so that it is instantly available when a user or business partner request comes in.