deutsche Version
 

 

 

 

 

Environmental information system at the German Federal Environmental Agency

German Federal Environmental Agency develops public XML database

The Sema Group used the new XML technology for the first time to develop a "broker" for environmental information at the German Federal Environmental Agency. The application runs on Software AG’s information server, Tamino.

Search engines are a cornerstone of good Web use: Anyone who wants to do more than just repeatedly access a limited number of familiar Web sites, but rather wants to exploit the Web’s information potential to answer a variety of questions, can’t do it without search engines. But search engines often return irrelevant addresses (URLs), partly as a consequence of the way they work. Basically, they merely carry out a full-text search across countless different Web sites, but they are essentially limited to static Web sites. Information obtained by the user of a site by means of menus or database access is not generally visible to the search engines. And these dynamic Web sites are playing an increasingly important role on the Web.

The GEIN 2000 project

The German Federal Environmental Agency (UBA) was confronted with this problem when planning its GEIN 2000 network (German Environment Information Network). The purpose of GEIN is to provide access to information available on the Web sites of numerous public-sector organizations – environmental authorities, German federal and state statistical offices, ministries and so on – and thus serve as an information broker for environmental information in Germany. GEIN will be available in the future to anyone who is interested and will be on display at Expo 2000 in Hanover. It is an essential component of the environmental presentation system Umwelt 2000, which provides the interested layman with access to in-depth environmental information. GEIN 2000 is thus aimed at the public rather than being an internal project intended for the benefit of the various public-sector authorities involved.

The Sema Group was entrusted with the development of the GEIN 2000 project. Because of the project’s value as a model, the company decided to implement it on the basis of the new Internet standard XML (eXtensible Markup Language) using Software AG’s new XML information server, Tamino.

"The decision to develop the project on the basis of XML may initially have been affected by the preferences of the project group," explains Thomas Bandholtz, project manager with the Sema Group. "We wanted to see what could be achieved with this new technology, although we deliberately kept our options open so that we could have reverted to traditional structures such as SQL if necessary. However, potential users turned out to be just as persuaded of the value of XML as we were. We were astonished at how positively all those questioned responded to XML. The extent to which XML has been accepted is quite fantastic."

One of the prime advantages of XML is that data interchange is no longer proprietary. Everything is based on the open standard of the W3C and can thus be smoothly integrated into the Web. This will make it very easy to integrate new sources of information into GEIN 2000 in the future. Since the essence of GEIN 2000 is that it should cover as many and as varied sources of information as possible, the coordination and harmonization of data formats from different systems is of central importance.

"Coordination between different public bodies is normally a very laborious affair that can take years. It cannot be expected that everything will be finished within ten minutes with XML, because the measurement methods still have to be coordinated, for example. But XML allows you to arrive at a common basis for the exchange of information very quickly and frees you to concentrate on the contents. Things like field lengths, separators and end-of-record characters, for example, which previously always had to be resolved, are no longer an issue with XML," says Bandholtz.

GEIN 2000 and XML

Not only does GEIN 2000 control data interchange with XML; it also uses XML for its internal processes. Thus, GEIN 2000 includes an integrated geographical thesaurus in XML format, by means of which it is possible to deal with queries relating to geographical aspects. This geographical thesaurus solves a typical problem of search engines: namely, that a search term cannot be identified as a geographical term. Anyone searching for "Lüneburg Heath," for example, will not find anything about "Wilseder Mountain," which is located on it.

The geographical thesaurus contains over 50,000 geographical terms, and XML makes it possible to enter these together with their geographical context. This means that you no longer need a word match when searching for place names; you can search by geographical context. Of course, the terms have to be entered accordingly – the W3C is probably unaware that Wilseder Mountain is on Lüneburg Heath. Time periods can be evaluated in the same way: XML allows "from – to" to be represented as a real time period in GEIN 2000. The only way to answer questions like these by non-proprietary means is to use XML.

The basis for GEIN 2000 is the XML application Resource Description Framework (RDF), which permits a more complex query logic than would be possible with HTML. Using RDF, an XML-compliant G2K (GEIN 2000) profile was set up that can be analyzed and evaluated by a parser. The information itself does not have to be in XML format. A source URL can be specified in RDF that leads to a specific document.

The following – abridged and translated – example (See Figure 1) shows a "record" for GEIN 2000, which prepares a kind of subject heading for the GEIN broker. One of the advantages of XML immediately becomes clear here: The example can be understood in a short period of time by anyone who knows even the rudiments of XML. This makes adaptation to destination systems very easy, which is obviously very important particularly for a project like GEIN 2000 that crosses the boundaries of subjects and organizations.

<rdf:RDF>

<rdf:Description about="http://www.site.de/rheinwasser.html">

<g2k:title> Water quality of the Rhine by Bonn 1994-1998 </g2k:title>

<g2k:abstract>

This document describes the water quality

of the Rhine measured in Bonn in the

years 1994 to 1998

</g2k:abstract>

<g2k:topic thesaurus= "

http://www.gein2000.de/profile/02/ubathes">

<g2k:item ID="4711">Water quality </g2k:Item>

</g2k:topic>

<g2k:area thesaurus= "

http://www.gein2000.de/profile/02/geothes" >

<g2k:item ID="4712">Rhine </g2k:item>

<g2k:item ID="4713">Bonn </g2k:item>

</g2k:area>

<g2k:time>

<g2k:from> 1994</g2k:from>

<g2k:to> 1998</g2k:to>

</g2k:time> </rdf:Description>

</rdf:RDF>

Figure 1

When a search query is made, GEIN 2000 sends the search criterion generically in RDF as well (See figure 2).

The response is packed in a <resultSet> block that makes reference to the ID of the question. This block contains the descriptions of the information that is to appear in the hit list (See figure 3).

GEIN 2000 can also evaluate indices available locally, address local search functions directly and, above all, evaluate dynamic Web sites as well. The selection, which the user normally makes manually, is automated by GEIN 2000; consequently, rather than being displayed in the browser, the reply URL is sent back to GEIN 2000 together with the title and abstract.

<?xml version="1.0" encoding="utf-8"?>

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:g2k="http://www.gein.de/g2k-profile/02/profile">

<rdf:description ID="4711">

<g2k:detailedSearch language="de">

<g2k:topic thesaurus= " http://www.gein2000.de/profile/02/ubathes">

<g2k:item ID="4711">Water quality </g2k:Item>

</g2k:topic>

<g2k:area thesaurus= " http://www.gein2000.de/profile/02/geothes"

match ="or">

<g2k:item ID="4712">Rhine </g2k:item>

<g2k:item ID="4713">Bonn </g2k:item>

</g2k:area>

<g2k:time>

<g2k:from> 1994</g2k:from>

<g2k:to> 1998</g2k:to>

</g2k:time>

</g2k:detailedSearch>

</rdf:description>

</rdf:RDF>

Figure 2

...

<g2k:resultSet about="4711">

....

</g2k:resultSet>

Figure 3

XML and Tamino

The Sema Group evaluated a number of systems with a view to using them in the GEIN 2000 project, including relational and object-oriented database management systems (ODBMSs). The former have to convert XML to their data structure, which is quite different, and thus do not really suit the technology of GEIN 2000. And, there are also clear problems in getting customers to accept ODBMSs. Software AG’s Tamino information server, on the other hand, offers a pure XML structure: From the outset it was developed for the storage of XML documents and it stores them in their native format, thus dispensing with the need for any conversion. Tamino can therefore take full advantage of the widespread acceptance of XML. In an initial phase of the project, GEIN 2000 will be using Tamino to keep a structured XML index of around 60,000 objects as well as a multilingual thesaurus of terms and the geographical thesaurus already mentioned.

GEIN 2000 is much more than a new search engine based on XML. It functions as a broker, while the widely distributed Web sites, databases, etc. that it references behave as a distributed data management system under GEIN 2000. The project thus takes full advantage of the essential strength of Tamino, namely that it is a universal information server.

The German Federal Environmental Agency

The Federal Environmental Agency (UBA) is a scientific authority attached to the German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (BMU). The high status accorded to its analyses and recommendations for the purpose of political decisions and its independence of interest groups make the agency a unique environmental organization in Germany.

The agency investigates, describes and evaluates the state of the environment in order to identify situations that are detrimental to the population or the environment as early and comprehensively as possible. Its tasks include drawing up detailed concepts and proposing effective action to the federal ministry for the environment (BMU) and other federal ministries. It also advises other state, local and private-sector organizations. The agency informs the public in layman’s terms of the causes and practical options available for the solution of environmental problems. It makes its knowledge and experience available nationally and internationally and is active in international committees and conferences with the aim of furthering international environmental protection.

Main emphases in the development of the GEIN 2000 project

  • Implementation of a search engine for the information offered by GEIN 2000
  • Recommendations to the providers of information on how to set up their Web sites for inclusion in GEIN 2000
  • Definition of a simple and universal search protocol for the environmental information and of a concise metarecord with information on space and time
  • Support in the implementation of this record and in defining subject headings
  • Integration of local search methods via network interfaces (CGI or RMI, for example) or by means of conversion filters on locally generated indices

The Sema Group

The Sema Group is one of the world’s leading information technology companies. Its activities center around outsourcing, systems integration, application development and consulting. With around 20,000 employees in over 120 offices throughout the world, the group earned the equivalent of around DM 3.75 billion in sales revenue in 1998. The company is listed on the London and Paris stock exchanges. The Sema Group has been active in Germany for over 30 years now, where it currently employs a staff of 750.