XML SQL for the Web
by Jürgen Harbarth, Software AG
XML (eXtensible Markup Language) is increasingly
finding acceptance as a non-proprietary standard and is suitable not
only for describing documents, but also as a data description
language for information stored in databases.
Among insiders XML is well known as a definition
language for the description of content-related structures. However
XML is much more than just another standard. What is new about XML
is its flexibility and universality because XML is some kind of
grammar which can express any content. On the Web in particular, the
standardized representation of content structures instead of merely
formal ones as is the case with HTML opens up unexpected
opportunities.
|
What is XML?
XML is a universal standard for the representation of data
on the basis of SGML and it goes much further than the HTML
standard dominating the Web to date. In contrast to HTML, XML
is able to structure data not only by formal criteria (such as
title, body, text, etc.), but also by contextual aspects. To
support the very diverse content available on the Web, XML is
sufficiently flexible to allow user groups to define their own
context-based structural features. Thus it is possible for
example that to exchange weather data meteorologists define
some tags such as <temperature>, <air pressure>,
<wind-force> etc. and store them in corresponding
templates. XML-enabled applications could then process such
Web sites directly, i.e. they could evaluate weather data
automatically via the Web.
XML allows you to access XML-capable servers directly from
your browser and retrieve information collected by the XML
server from different information types, including existing
relational data. An application uses the same service. Both
use the URL address to connect to the desired XML server.
Conventional applications can continue to run in parallel and
access their SQL data. |
|
In the future, mission-critical applications will
also have to run on the World Wide Web. The HTTP protocol is used
for addressing purposes on the Web. Add the logical structuring
capability of XML to this simple, universally accepted protocol, and
you have a new infrastructure that is ideal for running electronic
business applications on the Internet. Databases can now be accessed
directly via XML without having to use CGI and HTML or Java in
addition.
Efficient Storage
The simplest way to provide a database view of XML
objects is to store them as character large objects in a
database. At first sight this would not appear to add significant
benefit to storage in a conventional file system, over and above the
typical advantages of databases such as consistency, restart
capability and recovery. But database storage offers additional ways
of indexing these information objects, and thus provides more
flexible access paths. This enables these objects to be accessed in
a variety of ways via their structure (structure-based retrieval) as
well as their content (content-based retrieval).
All Internet objects could thus be managed in a
database, realizing the goal of running complete application systems
on a homogeneous technical infrastructure from the storage of the
information objects to their presentation on the users screen.
Objects whose specific attributes permit a differentiated approach
can be processed transparently in such a system.
This approach also makes it possible to develop
database search engines for sifting through the large volumes of
information that "live" on the Internet. The combination
of database technology with content-based search methods based on
XML opens an interesting perspective for the performance of such
applications.
Since in this case a search engine has direct
access to all relevant XML tags, the search will also be more
effective. The documents characteristics as regards content can
be analyzed directly instead of the computer logic having to sift
through all documents looking for more or less arbitrary key words.
XML and SQL
The essential logical aspect of the integration of
XML with conventional database technologies is the implementation of
XML structures in widespread data models, such as 3NF (Third Normal
Form). This aspect can be viewed from two angles.
One angle is the use of SQL data structures via an
XML interface; the other angle is the implementation of XML objects
in a 3NF data structure, in other words in a relational database.
Basically, SQL data can always be used as XML objects, but XML does
not yet provide a suitable infrastructure for modeling relations
between relational tables. Moreover, the absence of data type
definition capabilities - the definition of field attributes such as
numerical - is a major stumbling block when describing SQL data
structures using XML.
To overcome this, there have been a number of attempts to integrate
these structural elements in XML at a higher level of abstraction.
The most promising of these is XML-Data, a proposal for providing
full descriptions of SQL structures on the basis of XML, which has
been submitted to the WWW Consortium in the form of a technical
note. This proposal is heavily backed by Microsoft, who are already
using it as the basis for describing information channels in
Internet Explorer using the Channel Definition Format (CDF).
Though the implementation of the kind of tree
structures supported by XML in the form of a relational data model
is apparently a problem which can be technically solved, it is also
one which in practice cannot be nailed down to specifics. Moreover,
the typical document structure of many information objects modeled
with XML, such as long text elements, pictures and complex
cross-references, cannot be represented directly by relational
means. Providing a better mapping for this kind of objects is just
XMLs big advantage over pure SQL. The implementation of XML
objects as hybrid objects, on the other hand, may be a promising
approach: First, a class of objects is described by the
corresponding DTD (Document Type Definition). A mapping rule then
describes the mapping of the XML structure to a hybrid
infrastructure consisting of an XML data component and a
conventional table structure (schema mapping). This mapping rule
takes the form of a DTD and, together with the DTD that describes
the structure of the XML objects, is part of a repository on the
information server.
The information made available in such a system can
be retrieved in the form of XML objects as well as by using SQL. It
can thus be used both by typical business applications and for
transparent data interchange using for example EDI (Electronic Data
Interchange) or e-mail.
XML instead of SQL
Obviously, it is much easier to meet these
requirements in databases that go beyond the relational approach.
Software AGs database system Adabas already provides data
structures that exceed by far the typical 3NF data model. On the
basis of Adabas, hybrid XML objects could be processed considerably
more transparently and effectively than data in purely relational
databases.
A number of approaches in the world of
object-oriented databases show that XML provides an almost generic
way of implementing the structures of persistent objects on a
server. This means that persistent objects could be made available
via a transparent interface using XML and used by a large number of
users both locally and on the Web. An XML server provides an ideal
basis for storing and exchanging diverse objects and thus for object
serialization.
The fact that data can be structured implicitly in
particular allows for enormous flexibility which is needed
especially for electronic business applications. Requests are not
formulated as a series of SQL queries, instead, they are sent to the
server as a URL. Thus not only relational databases but also other
sources such as image files can be searched in just one query. This
allows to use complex objects such as data on a hospital patient
including x-rays very effectively. The deployment of XML avoids
the problems proprietary approaches have.
XML represents a significant step forward in the
use of IT application systems on the Internet, an important
prerequisite for electronic business. To date, the information used
in IT systems had to be processed in different ways, depending on
the data type. The relational schema could only be used for the kind
of information that was representable in tables (i.e. rows and
columns). That sufficed conventional business information systems.
But in the future the challenge will be to transfer all types of
information into an all-encompassing structure, as illustrated in
the example of a hospital patients data. The strength of XML lies
in its ability to integrate text, data and graphics into a uniform
structure that serves both presentation and database storage
purposes.
|
Example of a complete
information record containing all data on a hospital patient
in XML. The information consists of administrative data, text
and the reference to the x-ray picture that is not stored in a
relational database.
<Box
<Patient>
<Name>Smith</Name>
<First
name>Kevin</First name>
<Date of
birth>15.10.1967</Date of birth>
<Insurance
no.>125834660</Insurance no.>
<Patient
no.>63495</Patient no.>
<Diagnosis>
<Illness>Appendicitis </Illness>
<x-ray
picture="http://picture23.gif"/>
</Diagnosis>
<Symptoms>Severe
abdominal pain </Symptoms>
</Patient>
End of Box> |
|