Environmental information system at the German Federal Environmental
Agency
German Federal Environmental Agency develops public XML database
The Sema Group used the new XML technology for the first time to develop a
"broker" for environmental information at the German Federal Environmental
Agency. The application runs on Software AGs information server, Tamino.
Search engines are a cornerstone of good Web use: Anyone who wants to do more than just
repeatedly access a limited number of familiar Web sites, but rather wants to exploit the
Webs information potential to answer a variety of questions, cant do it
without search engines. But search engines often return irrelevant addresses (URLs),
partly as a consequence of the way they work. Basically, they merely carry out a full-text
search across countless different Web sites, but they are essentially limited to static
Web sites. Information obtained by the user of a site by means of menus or database access
is not generally visible to the search engines. And these dynamic Web sites are playing an
increasingly important role on the Web.
The GEIN 2000 project
The German Federal Environmental Agency (UBA) was confronted with this problem when
planning its GEIN 2000 network (German Environment Information Network). The purpose of
GEIN is to provide access to information available on the Web sites of numerous
public-sector organizations environmental authorities, German federal and state
statistical offices, ministries and so on and thus serve as an information broker
for environmental information in Germany. GEIN will be available in the future to anyone
who is interested and will be on display at Expo 2000 in Hanover. It is an essential
component of the environmental presentation system Umwelt 2000, which provides the
interested layman with access to in-depth environmental information. GEIN 2000 is
thus aimed at the public rather than being an internal project intended for the benefit of
the various public-sector authorities involved.
The Sema Group was entrusted with the development of the GEIN 2000 project.
Because of the projects value as a model, the company decided to implement it on the
basis of the new Internet standard XML (eXtensible Markup Language) using Software
AGs new XML information server, Tamino.
"The decision to develop the project on the basis of XML may initially have been
affected by the preferences of the project group," explains Thomas Bandholtz, project
manager with the Sema Group. "We wanted to see what could be achieved with this new
technology, although we deliberately kept our options open so that we could have reverted
to traditional structures such as SQL if necessary. However, potential users turned out to
be just as persuaded of the value of XML as we were. We were astonished at how positively
all those questioned responded to XML. The extent to which XML has been accepted is quite
fantastic."
One of the prime advantages of XML is that data interchange is no longer proprietary.
Everything is based on the open standard of the W3C and can thus be smoothly integrated
into the Web. This will make it very easy to integrate new sources of information into
GEIN 2000 in the future. Since the essence of GEIN 2000 is that it should cover
as many and as varied sources of information as possible, the coordination and
harmonization of data formats from different systems is of central importance.
"Coordination between different public bodies is normally a very laborious affair
that can take years. It cannot be expected that everything will be finished within ten
minutes with XML, because the measurement methods still have to be coordinated, for
example. But XML allows you to arrive at a common basis for the exchange of information
very quickly and frees you to concentrate on the contents. Things like field lengths,
separators and end-of-record characters, for example, which previously always had to be
resolved, are no longer an issue with XML," says Bandholtz.
GEIN 2000 and XML
Not only does GEIN 2000 control data interchange with XML; it also uses XML for
its internal processes. Thus, GEIN 2000 includes an integrated geographical thesaurus
in XML format, by means of which it is possible to deal with queries relating to
geographical aspects. This geographical thesaurus solves a typical problem of search
engines: namely, that a search term cannot be identified as a geographical term. Anyone
searching for "Lüneburg Heath," for example, will not find anything about
"Wilseder Mountain," which is located on it.
The geographical thesaurus contains over 50,000 geographical terms, and XML makes it
possible to enter these together with their geographical context. This means that you no
longer need a word match when searching for place names; you can search by geographical
context. Of course, the terms have to be entered accordingly the W3C is probably
unaware that Wilseder Mountain is on Lüneburg Heath. Time periods can be evaluated in the
same way: XML allows "from to" to be represented as a real time period in
GEIN 2000. The only way to answer questions like these by non-proprietary means is to use
XML.
The basis for GEIN 2000 is the XML application Resource
Description Framework (RDF), which permits a more complex query logic than would be
possible with HTML. Using RDF, an XML-compliant G2K (GEIN 2000) profile was set up
that can be analyzed and evaluated by a parser. The information itself does not have to be
in XML format. A source URL can be specified in RDF that leads to a specific document.
The following abridged and translated example (See Figure 1) shows a
"record" for GEIN 2000, which prepares a kind of subject heading for the
GEIN broker. One of the advantages of XML immediately becomes clear here: The example can
be understood in a short period of time by anyone who knows even the rudiments of XML.
This makes adaptation to destination systems very easy, which is obviously very important
particularly for a project like GEIN 2000 that crosses the boundaries of subjects and
organizations.
| <rdf:RDF>
<rdf:Description about="http://www.site.de/rheinwasser.html">
<g2k:title> Water quality of the Rhine by Bonn 1994-1998 </g2k:title>
<g2k:abstract>
This document describes the water quality
of the Rhine measured in Bonn in the
years 1994 to 1998
</g2k:abstract>
<g2k:topic thesaurus= "
http://www.gein2000.de/profile/02/ubathes">
<g2k:item ID="4711">Water quality </g2k:Item>
</g2k:topic>
<g2k:area thesaurus= "
http://www.gein2000.de/profile/02/geothes" >
<g2k:item ID="4712">Rhine </g2k:item>
<g2k:item ID="4713">Bonn </g2k:item>
</g2k:area>
<g2k:time>
<g2k:from> 1994</g2k:from>
<g2k:to> 1998</g2k:to>
</g2k:time> </rdf:Description>
</rdf:RDF> |
Figure 1
When a search query is made, GEIN 2000 sends the search criterion generically in RDF as
well (See figure 2).
The response is packed in a <resultSet> block that makes reference to the ID of
the question. This block contains the descriptions of the information that is to appear in
the hit list (See figure 3).
GEIN 2000 can also evaluate indices available locally, address local search
functions directly and, above all, evaluate dynamic Web sites as well. The selection,
which the user normally makes manually, is automated by GEIN 2000; consequently,
rather than being displayed in the browser, the reply URL is sent back to GEIN 2000
together with the title and abstract.
| <?xml version="1.0"
encoding="utf-8"?> <rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:g2k="http://www.gein.de/g2k-profile/02/profile">
<rdf:description ID="4711">
<g2k:detailedSearch language="de">
<g2k:topic thesaurus= " http://www.gein2000.de/profile/02/ubathes">
<g2k:item ID="4711">Water quality </g2k:Item>
</g2k:topic>
<g2k:area thesaurus= " http://www.gein2000.de/profile/02/geothes"
match ="or">
<g2k:item ID="4712">Rhine </g2k:item>
<g2k:item ID="4713">Bonn </g2k:item>
</g2k:area>
<g2k:time>
<g2k:from> 1994</g2k:from>
<g2k:to> 1998</g2k:to>
</g2k:time>
</g2k:detailedSearch>
</rdf:description>
</rdf:RDF> |
Figure 2
| ... <g2k:resultSet about="4711">
....
</g2k:resultSet> |
Figure 3
XML and Tamino
The Sema Group evaluated a number of systems with a view to using them in the
GEIN 2000 project, including relational and object-oriented database management
systems (ODBMSs). The former have to convert XML to their data structure, which is quite
different, and thus do not really suit the technology of GEIN 2000. And, there are
also clear problems in getting customers to accept ODBMSs. Software AGs Tamino
information server, on the other hand, offers a pure XML structure: From the outset it was
developed for the storage of XML documents and it stores them in their native format, thus
dispensing with the need for any conversion. Tamino can therefore take full advantage of
the widespread acceptance of XML. In an initial phase of the project, GEIN 2000 will
be using Tamino to keep a structured XML index of around 60,000 objects as well as a
multilingual thesaurus of terms and the geographical thesaurus already mentioned.
GEIN 2000 is much more than a new search engine based on XML. It functions as a
broker, while the widely distributed Web sites, databases, etc. that it references behave
as a distributed data management system under GEIN 2000. The project thus takes full
advantage of the essential strength of Tamino, namely that it is a universal information
server.
The German Federal Environmental Agency
The Federal Environmental Agency (UBA) is a scientific authority attached to the German
Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (BMU). The
high status accorded to its analyses and recommendations for the purpose of political
decisions and its independence of interest groups make the agency a unique environmental
organization in Germany.
The agency investigates, describes and evaluates the state of the environment in order
to identify situations that are detrimental to the population or the environment as early
and comprehensively as possible. Its tasks include drawing up detailed concepts and
proposing effective action to the federal ministry for the environment (BMU) and other
federal ministries. It also advises other state, local and private-sector organizations.
The agency informs the public in laymans terms of the causes and practical options
available for the solution of environmental problems. It makes its knowledge and
experience available nationally and internationally and is active in international
committees and conferences with the aim of furthering international environmental
protection.
Main emphases in the development of the GEIN 2000 project
- Implementation of a search engine for the information offered by GEIN 2000
- Recommendations to the providers of information on how to set up their Web sites for
inclusion in GEIN 2000
- Definition of a simple and universal search protocol for the environmental information
and of a concise metarecord with information on space and time
- Support in the implementation of this record and in defining subject headings
- Integration of local search methods via network interfaces (CGI or RMI, for example) or
by means of conversion filters on locally generated indices
The Sema Group
The Sema Group is one of the worlds leading information technology companies. Its
activities center around outsourcing, systems integration, application development and
consulting. With around 20,000 employees in over 120 offices throughout the world, the
group earned the equivalent of around DM 3.75 billion in sales revenue in 1998. The
company is listed on the London and Paris stock exchanges. The Sema Group has been active
in Germany for over 30 years now, where it currently employs a staff of 750. |