deutsche Version
 

 

 

 


Cost-Effective EDI Using XML?
A Pivot-Oriented Approach

Contents

Introduction
Various Syntaxes for Exchanging Structured Data
A brief look at these syntaxes
A Pivot-Oriented Approach
XML-based EDI applications
Conclusions
References

Author

Philippe Vijghen, ACSE sa, a Software AG company

Abstract

This paper proposes the use of a pivot format when developing EDI applications, based on the experience of three operational projects.

The role of SGML/XML, as pivot, is presented in a broader context, with regard to other relevant candidates for structuring data.

Biographical Note

Philippe Vijghen is a project manager at ACSE sa/nv, Brussels, a member of the SGML Technologies Group. He is a software engineer and systems architect specializing in object-oriented distributed applications and complex document-oriented Electronic Data Interchange systems; in addition to structuring documents, these systems make use of SGML at other levels, such as for external application programming interfaces. He obtained a degree, specializing in electromechanical engineering, at the Free University of Brussels (ULB). He may be contacted at phv@sgmltech.com.

Introduction

More and more companies, together with their partners, are moving towards EDI (Electronic Data Interchange). Traditional EDI, however, has the reputation of being inflexible and expensive.

This paper calls on the experience gained from the use of SGML/XML in various EDI projects. It illustrates how the use of a central pivot model for implementing an EDI application is cost-effective, owing to its reusability and scalability. It also demonstrates that, in terms of data modelling, SGML/XML offers the flexibility required for such a pivot role in the EDI system.

The first section gives an overview of the various syntaxes that have been defined in the past for exchanging data.

A pivot-oriented approach, and its benefits, are explained in the following section.

In the final section, three EDI projects are presented, where XML has been used as a pivot.
TOP

Various Syntaxes for Exchanging Structured Data

Several conventions have been defined for structuring electronic information in the past. Listed below are some well-known abstract and concrete data structuring mechanisms, and their respective application targets.

  • ASN.1 (Abstract Syntax Notation One) and BER (Basic Encoding Rules)

    ASN.1 is a common platform and a language-independent way of defining abstract data structures. The associated BER specify the concrete representation of an ASN.1 structure when it transits between computers.

    ASN.1 is very useful for documenting structures at a conceptual design level, especially when working at network level. However, the only meta-information carried in the concrete syntax is for the identification of low-level data types. Therefore, the identification of the data semantics relies on an implicit knowledge of the structure.

  • RPC (Remote Procedure Call)

    The various implementations of RPCs (eg by SUN, DCE, and Microsoft) address the need of software developers to call a function or procedure that resides on another machine. The RPC mechanisms include a platform-independent way to marshal the data (encoding/decoding before and after network transmissions) but its lack of meta-information makes it only suitable for volatile low-level inter-process communications.

  • CORBA (Common Object Request Broker Architecture) and IIOP (Internet Inter-Object Request Broker Protocol)

    CORBA, like RPC, also targets software integration and the development of network-enabled applications, but this time in an object-oriented environment. Like ASN.1, it includes a mechanism for specifying data structures in an abstract, language-independent way called IDL (Interface Description Language). IDL also includes an abstract specification of the functional interfaces. As for RPCs, there are associated marshalling mechanisms (such as the one specified for Internet, IIOP) for passing data structures between objects. No meta-information about the data semantics is included in the marshalled data, as CORBA relies on the volatile object interfaces for addressing such needs.

  • CSV (Comma Separated Values)

    The simple syntax of CSV has been defined for the exchange of tabular data, exported from databases or spreadsheets. The first row of the file, often including the titles of the various columns, can be considered as a meta-specification.

  • EDIFACT (Electronic Data Interchange for Administration, Commerce, and Transport)

    EDIFACT aims at specifying standard messages for exchanging electronic data in trade applications, eg orders and invoices. The syntax is built on top of separators and allows for messages to be structured as groups of segments that are themselves made up of data elements.

  • SGML (Standard Generalized Markup Language)

    This standard was designed to structure documents. It provides a flexible mechanism that enables the user to model document structures and to encode them so as to enhance the manipulation, exchange, and publication of the documents.

  • XML (eXtensible Markup Language)

    XML, a recently defined subset of SGML, brings the concepts behind SGML into the popular arena of Internet browsing. The idea behind XML is to maintain eighty per cent of the benefits of SGML, with only twenty per cent of its complexity. This enabled the development of easy XML support in tools as common as Web browsers. XML does not really target the SGML market; its major goal is the Internet distribution of structured documents. In terms of application target, Microsoft would also like to position XML as the next-generation CSV format, for interchanging data between databases, spreadsheets, and a few EDI applications.

    TOP

A Brief Look at These Syntaxes

Although they are all aimed at defining a structure below the level of file granularity, it is clear that all of these different specifications and syntaxes target different needs in terms of application environments.

If the purely technical aspects of the various syntaxes are examined, however, it must be concluded that SGML/XML allows for more complex modelling.

  • ASN.1/BER, RPCs, and CORBA/IIOP mechanisms are difficult to use for defining complex recurrent structures with optional branches and various lengths of data. Moreover, as no meta-information about the data semantic is encoded, it would be impossible to figure out which information is carried by looking only at the data: the encoding/decoding relies on implicit knowledge of the model. In any case, they are not really suitable for the archiving of information as they lack the set of tools that could be expected for achieving it.
  • CSV has the advantage of being extremely simple and is used everywhere. But it is limited to the export of data that can be represented as a list of rows.
  • EDIFACT syntax is based on four different levels: messages, groups, segments, and data elements. It is very appropriate for structuring information that has been defined in the standardized catalogues of messages and segments. However, it is difficult to use EDIFACT as a generic syntax for modelling arbitrary information, especially when dealing with structured text. Moreover, EDIFACT lacks a formal meta-language.
  • XML includes a formal way of specifying the model through the notion of a DTD. It offers a high-level of flexibility and extensibility and is particularly appropriate for modelling trees and graphs (by using links). It is simple to use. It is even possible to exploit an XML instance without knowing the details of the meta-specification that is the DTD. Unfortunately, it has very limited support for data types (the next revision of the standard will address this point).
  • SGML includes many features that are not supported by XML. Those features can legitimately be considered as unnecessarily complex when the EDI data structures are considered. However, as explained in [Vijghen 97] , many of those additional features are invaluable in a development environment used for processing the information.

This very straightforward comparison only takes into account the syntax and the modelling flexibility, of course. Although this comparison is irrelevant when the difference in application that is targeted by each of them is individually considered, the comparison is invaluable when considering the best candidate for representing pivot models.

Indeed, this paper addresses the need for consistent use of a pivot format when developing EDI applications. With a pivot format in mind, the use of the most flexible and scalable syntax is fundamental; there, the choice is purely technical and is independent of the external representation format that may be required by the users in function of the application field.

TOP

A Pivot-Oriented Approach

When developing EDI applications, one of the key tasks consists of implementing filters for processing the messages to be exchanged. Such filters aim either at converting the messages from one representation to another, or at using the information contained in the message in databases or other external applications.

Our experience demonstrated that using a pivot-oriented approach for developing such filters proved to be extremely cost-effective. The approach consists of using a single internal representation of the information, for the implementation of all the filters applied to EDI messages.

Note that the word `pivot' here has a different meaning from that in the traditional EDI terminology, where it often designates more restrictively the representation used for loading and exporting messages to and from databases.

The cost effectiveness of such an approach is justified by the following facts:

  • the code can be independent of the actual concrete representation of the information, by relying only on the pivot model;
  • the application can be adapted more easily if the public syntax of a message is modified;
  • the set of features available in the development environment, associated with the pivot syntax, becomes independent of the end-user's choice of public representation syntax;
  • the use of a consistent set of tools and techniques for manipulating the pivot syntax, not only across filters but also across projects, improves the productivity of the developers;
  • the number of filters that must be developed for implementing all the conversions between a set of formats is proportional to the number of formats, and does not grow exponentially as it does in cases where no pivot format is selected.

We, the SGML Technologies Group, have chosen to use SGML/XML and an integrated development environment based on this technology as the cornerstone for many projects based on a pivot-oriented approach, including those in the field of EDI.

TOP

XML-Based EDI Applications

Experience gained during the development of various EDI applications includes:

  • G-EDI, a generic tool for parsing and processing EDIFACT messages, first developed for implementing an EDI system for the handling of telecommunication bills of a major Belgian bank;
  • CLASET, an EDIFACT and SGML-based system for exchanging nomenclatures in the context of the European institution for statistics (EUROSTAT);
  • EDIDOC, a generic framework for the Electronic Data Interchange of Documents, developed for the European Space Agency, that has been operational for a few years./LI>

G-EDI

The G-EDI project, aiming at processing the telecommunication bills of a major banking institution in Belgium, initiated the development of a generic EDIFACT parser. The key point is that the parser was based on the notion of a pivot format. In practice, the implementation is based on SGML technology.

Although the SGML tags and syntax did not help as such for this implementation, the generic coding mechanisms that are part of SGML helped to keep the application independent from the actual syntax of the message. Indeed, XML offers all the possibilities that are required for modelling the information contained in EDIFACT, as it allows for the encoding of the documents with regard to arbitrary complex tree models and, if hyperlinks are considered, even graph models.

Although the actual EDIFACT syntax of the messages that are transmitted by the telecommunications company changed four times since the system was put into production, only the mapping to the generic underlying model had to be reviewed. Owing to the approach, it has been possible to reduce the application maintenance costs by a factor of five.

TOP

CLASET

The goal of the CLASET (Classification Information Set Message) project, developed in the context of the European Programme for the Interchange of Data between Administration (IDA), is rather ambitious. Take the example of the definition of `secondary school' in the various European countries. The reader will understand that there is no consistency at all. But the European Institutions still want to produce accurate statistics on such matters, across the internal borders. In order to achieve this, complex nomenclatures for statistics must be defined. CLASET includes the definition messages for exchanging such nomenclatures.

The CLASET message allows for the exchange of any hierarchical structure, such as nomenclatures or classifications, and has conceptually been defined as a result of a Merise model.

Different representations are used, each based on a distinct syntax:

  • an EDIFACT representation of the messages is being standardized in official committees;
  • n SGML representation is used as an alternative format to which the EDIFACT representation can be mapped;
  •  
  • HTML representations of the messages can be produced, as generated from the SGML representation.

The EDIFACT representation is the most official representation of the information transported by CLASET messages, because of the standardization process.

The SGML representation is recommended to people who are dealing with highly structured text, because the SGML representation offers more flexibility than EDIFACT for operations such as the attachment of footnotes or presentation styles to words that are part of the free text. Such structures can easily be encoded using an SGML mixed content group model, which has no equivalent in the EDIFACT layering of messages, groups, segments, and data.

The HTML representation is read-only: it enables people to view the messages using a tool such as a Web browser (used as a local viewer application in this case).

The actual implementation of the CLASET project is based on SGML, used as an internal pivot. This approach has lead to an application code that is independent of the actual details of the representation syntaxes and has been a major argument for the cost-effectiveness of the project. The benefit of such an approach is not due to the SGML syntax itself, but the tools associated with SGML offer the required flexibility and the relevant processing features.

TOP

EDIDOC

The EDIDOC (Electronic Data Interchange for Documents) project covers the design, implementation, and deployment of a flexible framework for document-oriented EDI at the European Space Agency (ESA).

The system is in charge of document exchanges occurring in several distinct applications of the agency:

  • working documents exchanged with the delegations;
  • calls for tender sent to potential bidders;
  • press releases and information notes sent to the press and public (including their WWW publication).

At the heart of the EDIDOC system, a central server acts as a clearing house, giving a potential legal value to the documents exchanged by logging them into a robust relational database.

This server integrates, in a very generic and flexible way, the key concepts needed for electronic document exchanges:

  • document standards (conformance checking and format conversions),
    eg SGML, EDIFACT, PDF, ASCII, RTF, and WordPerfect;
  • security packages ( information confidentiality and authentication),
    eg PGP and MAC MD-5;
  • access protocols (including network aspects),
    eg WWW, Internet e-mail, X.400, and FTP.

At each of these levels the server makes sure that the documents are delivered in accordance with the preferences of the recipients: in the right format, with the right security package, and the right communication protocols. It really plays the role of a gateway.

The EDIDOC generic envelopes have been defined in XML. They include the details of the exchange: originator, list of recipients, unique reference, subject, time stamps, document types and formats, security mechanisms, delivery options, groupware context, remote management options, error messages, and so on.

The filters that are plugged in the EDIDOC document standards' components are based on the notion of a pivot format for conversions. Although the use of SGML is not enforced, it is the best candidate for defining custom pivot formats for structured documents. Indeed, SGML includes a very consistent and generic way to model the information. Moreover, the use of SGML as a pivot format can help for the actual implementation of the converters because some of the SGML features, such as OMITAG, SHORTREF, LINK, and CONCUR, can be used for the actual implementation of the convertors themselves.

EDIDOC has demonstrated how important and cost-effective it can be to have a system that uses a pivot format at the heart of an implementation, even when the format is not being exposed to users or external systems. In the context of EDIDOC, this `pivot' paradigm was applied not only to the messages themselves but also to all the surroundings services (communications, security, and workflow), owing to an object oriented approach. This has given provision for reusability, scalability, and customizing.

TOP

Conclusions

This paper demonstrates that where using a common pivot format, SGML/XML is invaluable for the development and integration of EDI applications. It was illustrated by some operational experiences in the context of the G-EDI, CLASET, and EDIDOC projects.

References

  • [Vijghen 97] Philippe Vijghen; Experience of EDI for Documents: The Role of SGML. In Conference Proceedings of SGML'97 US, December 1997, pp 213-18

TOP


Please e-mail your comments to Philippe Vijghen at phv@sgmltech.com

This paper was first published in the Conference Proceedings of SGML/XML'98 Europe, May 1998, pp 517-22.

© The SGML Technologies Group 1998