deutsche Version
 

 

 

 
 

XML – SQL for the Web

by Jürgen Harbarth, Software AG

XML (eXtensible Markup Language) is increasingly finding acceptance as a non-proprietary standard and is suitable not only for describing documents, but also as a data description language for information stored in databases.

Among insiders XML is well known as a definition language for the description of content-related structures. However XML is much more than just another standard. What is new about XML is its flexibility and universality because XML is some kind of grammar which can express any content. On the Web in particular, the standardized representation of content structures instead of merely formal ones as is the case with HTML opens up unexpected opportunities.

What is XML?

XML is a universal standard for the representation of data on the basis of SGML and it goes much further than the HTML standard dominating the Web to date. In contrast to HTML, XML is able to structure data not only by formal criteria (such as title, body, text, etc.), but also by contextual aspects. To support the very diverse content available on the Web, XML is sufficiently flexible to allow user groups to define their own context-based structural features. Thus it is possible for example that to exchange weather data meteorologists define some tags such as <temperature>, <air pressure>, <wind-force> etc. and store them in corresponding templates. XML-enabled applications could then process such Web sites directly, i.e. they could evaluate weather data automatically via the Web.
XML allows you to access XML-capable servers directly from your browser and retrieve information collected by the XML server from different information types, including existing relational data. An application uses the same service. Both use the URL address to connect to the desired XML server. Conventional applications can continue to run in parallel and access their SQL data.

In the future, mission-critical applications will also have to run on the World Wide Web. The HTTP protocol is used for addressing purposes on the Web. Add the logical structuring capability of XML to this simple, universally accepted protocol, and you have a new infrastructure that is ideal for running electronic business applications on the Internet. Databases can now be accessed directly via XML without having to use CGI and HTML or Java in addition.

Efficient Storage

The simplest way to provide a database view of XML objects is to store them as “character large objects” in a database. At first sight this would not appear to add significant benefit to storage in a conventional file system, over and above the typical advantages of databases such as consistency, restart capability and recovery. But database storage offers additional ways of indexing these information objects, and thus provides more flexible access paths. This enables these objects to be accessed in a variety of ways via their structure (structure-based retrieval) as well as their content (content-based retrieval).

All Internet objects could thus be managed in a database, realizing the goal of running complete application systems on a homogeneous technical infrastructure from the storage of the information objects to their presentation on the users’ screen. Objects whose specific attributes permit a differentiated approach can be processed transparently in such a system.

This approach also makes it possible to develop database search engines for sifting through the large volumes of information that "live" on the Internet. The combination of database technology with content-based search methods based on XML opens an interesting perspective for the performance of such applications.

Since in this case a search engine has direct access to all relevant XML tags, the search will also be more effective. The document’s characteristics as regards content can be analyzed directly instead of the computer logic having to sift through all documents looking for more or less arbitrary key words.

XML and SQL

The essential logical aspect of the integration of XML with conventional database technologies is the implementation of XML structures in widespread data models, such as 3NF (Third Normal Form). This aspect can be viewed from two angles.

One angle is the use of SQL data structures via an XML interface; the other angle is the implementation of XML objects in a 3NF data structure, in other words in a relational database. Basically, SQL data can always be used as XML objects, but XML does not yet provide a suitable infrastructure for modeling relations between relational tables. Moreover, the absence of data type definition capabilities - the definition of field attributes such as numerical - is a major stumbling block when describing SQL data structures using XML.
To overcome this, there have been a number of attempts to integrate these structural elements in XML at a higher level of abstraction. The most promising of these is XML-Data, a proposal for providing full descriptions of SQL structures on the basis of XML, which has been submitted to the WWW Consortium in the form of a technical note. This proposal is heavily backed by Microsoft, who are already using it as the basis for describing information channels in Internet Explorer using the Channel Definition Format (CDF).

Though the implementation of the kind of tree structures supported by XML in the form of a relational data model is apparently a problem which can be technically solved, it is also one which in practice cannot be nailed down to specifics. Moreover, the typical document structure of many information objects modeled with XML, such as long text elements, pictures and complex cross-references, cannot be represented directly by relational means. Providing a better mapping for this kind of objects is just XML’s big advantage over “pure” SQL. The implementation of XML objects as hybrid objects, on the other hand, may be a promising approach: First, a class of objects is described by the corresponding DTD (Document Type Definition). A mapping rule then describes the mapping of the XML structure to a hybrid infrastructure consisting of an XML data component and a conventional table structure (schema mapping). This mapping rule takes the form of a DTD and, together with the DTD that describes the structure of the XML objects, is part of a repository on the information server.

The information made available in such a system can be retrieved in the form of XML objects as well as by using SQL. It can thus be used both by typical business applications and for transparent data interchange using for example EDI (Electronic Data Interchange) or e-mail.

XML instead of SQL

Obviously, it is much easier to meet these requirements in databases that go beyond the relational approach. Software AG’s database system Adabas already provides data structures that exceed by far the typical 3NF data model. On the basis of Adabas, hybrid XML objects could be processed considerably more transparently and effectively than data in purely relational databases.

A number of approaches in the world of object-oriented databases show that XML provides an almost generic way of implementing the structures of persistent objects on a server. This means that persistent objects could be made available via a transparent interface using XML and used by a large number of users both locally and on the Web. An XML server provides an ideal basis for storing and exchanging diverse objects and thus for object serialization.

The fact that data can be structured implicitly in particular allows for enormous flexibility which is needed especially for electronic business applications. Requests are not formulated as a series of SQL queries, instead, they are sent to the server as a URL. Thus not only relational databases but also other sources such as image files can be searched in just one query. This allows to use complex objects such as data on a hospital patient – including x-rays – very effectively. The deployment of XML avoids the problems proprietary approaches have.

XML represents a significant step forward in the use of IT application systems on the Internet, an important prerequisite for electronic business. To date, the information used in IT systems had to be processed in different ways, depending on the data type. The relational schema could only be used for the kind of information that was representable in tables (i.e. rows and columns). That sufficed conventional business information systems. But in the future the challenge will be to transfer all types of information into an all-encompassing structure, as illustrated in the example of a hospital patient’s data. The strength of XML lies in its ability to integrate text, data and graphics into a uniform structure that serves both presentation and database storage purposes.

 

Example of a complete information record containing all data on a hospital patient in XML. The information consists of administrative data, text and the reference to the x-ray picture that is not stored in a relational database.
<Box
   <Patient>
       <Name>Smith</Name>
       <First name>Kevin</First name>
       <Date of birth>15.10.1967</Date of birth>
       <Insurance no.>125834660</Insurance no.>
       <Patient no.>63495</Patient no.>
       <Diagnosis>
         <Illness>Appendicitis </Illness>
         <x-ray picture="http://picture23.gif"/>
       </Diagnosis>
       <Symptoms>Severe abdominal pain </Symptoms>
   </Patient>
End of Box>