deutsche Version
 

 

 

 

X-Query: A universal query interface for XML

By Juliane Harbart, Software AG

About the Author: Juliane Harbarth is technical consultant for database management systems R&D at Software AG. She is also a member of the XSL Working Group of the World Wide Web Consortium.

In October 1999, Software AG released Tamino, the first database management system that stores and delivers XML documents in native format, and offers a set of tools for managing XML data. Because Tamino was designed from the ground up to store and query XML documents, it processes such data more efficiently than standard relational database management systems. Such systems may claim that they are "XML-enabled," but retrofitting comes mostly at the sacrifice of access performance or programming difficulties for developers, who have to reconcile data-structure differences in their code.

This article focuses on Tamino’s XML query interface "X-Query," which was implemented in Tamino Version 2, released on Solaris in June 2000. X-Query is the only query language that combines the features from the latest World Wide Web Consortium (W3C) standards plus numerous features that are currently on the drawing board. This is possible because several Tamino developers are the same innovators driving the W3C standards process. Such names include Jonathan Robie and Michael Champion.

Why SQL is not an option

In the same way that relational databases are not suited for XML data storage, a relational query language such as SQL is not ideal for XML queries. SQL is designed to query the table structures of relational database, not the tree structures of XML databases like Tamino.

Tamino’s query interface X-Query combines the best features of several standard and proprietary query languages, including XPath - the XML query language recommended by the W3C, XQL and Adabas. The table below briefly describes those technologies that have molded X-Query.

XPath This was (and remains) the principal standard. Whether this is the natural way to query XML documents or just the convincing nature of the XPath approach, it is difficult to imagine a way to query XML data that is not based on XPath.
XQL A convincing concept that is more at home in the database world and thus is mandatory for a database query language. XQL contains some database-oriented syntax that is not contained in XPath, e.g. sorting.
Adabas Adabas (Adaptive Database System) is Software AG’s RDBMS. Although Adabas inner semantics are relational (i.e. table-like) and not tree-like as required for XML, its index mechanism and sophisticated text-retrieval capabilities are well suited for the XML world.

The W3C XPath Recommendation

The XPath query language recommendation originated from two separate W3C initiatives called XSL and XLink. In early 1999, the W3C’s XSL working group (WG) was developing XSL Transformations (XSLT), a language to describe tree transformations of XML documents. XSLT development included the task of how to specify certain parts of an XML document to be transformed. Parallel to this, the XLink WG was developing means to point from one XML document to another XML document. In an attempt to synchronize their activities, the two groups jointly worked upon a standard means to do their common pointing.

Since it became a W3C recommendation in October, 1999, XPath has gained enormous popularity and has been implemented in many XSLT processors (James Clark’s XT and Michael Kay’s SAXON). Because of the almost universal acceptance of XPath, Tamino X-Query contains almost the complete XPath scope, with the exception of XPath’s "axes" concept.

XPath is not a true query language, however, because it lacks several features that are required for querying, e.g. add, delete, update, insert, join, sort. Tamino’s X-Query has therefore been enriched with some appropriate XQL mechanisms, e.g. sort, and some proprietary add-ons in the field of data-indexing and text retrieval.

For retrieval, for example, X-Query provides an additional match operand (~=) that is similar to XPath’s equal (=) for strings, but normalizes, understands wildcards, and looks up words in sentences. Here is an example of an X-Path query.

XQL influences W3C standards

Another W3C effort that influenced the development of X-Query is the XML Query Language (XQL). Originally called XQL98, XQL was first proposed to the W3C in September 1998 by Jonathan Robie (Software AG), Joe Lapp (webMethods, Inc.) and David Schach (Microsoft Corporation). XQL strongly overlaps with XPath ideas and methods, and the syntax is similar. But the two standards differ in the way they view XML. XSL and XLink are mostly document-centered, inferring that an XML document is mostly text. This stems from XML’s SGML origins and is reflected, for example, in the DocBook DTD created by Norman Walsh.

The XQL movement, however, views XML from the database perspective. In this light, an XML document is referred to as "XML data" and terms foreign to the "document" world, such as "join" or "grouping," are mentioned.

The "position" issue reveals the different views taken by XPath and XQL. A position is used to specify the location of elements in an XML structure, which can be pictured like a tree with branches. From a XQL or "database" point of view, the first branch (or, in technical terms, the first child of a parent) has the position "zero." This is because most programming languages use the index beginning with "zero." But XPath indexes branches beginning with "one;" no self-respecting author would refer to the first chapter of a book as the "zeroth."

XML Query Language and Quilt

XQL, which still exists as a separate standard in its latest version XQL99, has influenced the development of other standards. Firstly, XQL has already left its footprint in the two documents the XML Query WG has released so far, i.e. the Query Requirements Document and the Query Data Model.

XQL has also influenced the development of Quilt, a new XML query language suggested to the Query WG in a proposal by the top guns in the database query field. These are: the author of XQL, Jonathan Robie (Software AG), one of the authors of SQL 2, Don Chamberlin (IBM) and an author of XML-QL, Daniela Florescu (INRIA), in March 2000.

Quilt comes the closest to what can be referred to as a global query language, because it combines a number of XML query approaches, the most influential ones being XQL and XML Query Language (XML-QL). XML-QL was first suggested as a W3C Note in August, 1998. This approach favored an SQL-like query syntax which looked rather different from XQL’s XPath-like syntax. The combination of these two ideas under the new Quilt roof appears very promising. Quilt is still in draft form, and will certainly go through many changes before it becomes a recommendation.

Several of the concepts from XQL have also found their way into the XPath Working Draft, mainly due to the fact that Jonathan Robie is also a member of the XSL WG. In November 1999, XPath became a "W3C Recommendation," the highest status a "standard" can achieve within the W3C. In September 1999, the W3C’s XML query WG was constituted, and XQL served as one of the main design inputs for the WG’s work.

For more information about the Tamino XML database, see http://www.softwareag.com/tamino