Author
Philippe Vijghen, ACSE sa, a Software AG company
Abstract
This paper proposes the use of a pivot format when developing EDI applications,
based on the experience of three operational projects.
The role of SGML/XML, as pivot, is presented in a broader context, with regard
to other relevant candidates for structuring data.
Biographical Note
Philippe Vijghen is a project manager at ACSE sa/nv, Brussels, a member of the SGML
Technologies Group. He is a software engineer and systems architect specializing in
object-oriented distributed applications and complex document-oriented Electronic Data
Interchange systems; in addition to structuring documents, these systems make use of SGML
at other levels, such as for external application programming interfaces. He obtained a
degree, specializing in electromechanical engineering, at the Free University of Brussels
(ULB). He may be contacted at phv@sgmltech.com.
More and more companies, together with their partners, are moving towards EDI
(Electronic Data Interchange). Traditional EDI, however, has the reputation of being
inflexible and expensive.
This paper calls on the experience gained from the use of SGML/XML in various EDI
projects. It illustrates how the use of a central pivot model for implementing an EDI
application is cost-effective, owing to its reusability and scalability. It also
demonstrates that, in terms of data modelling, SGML/XML offers the flexibility required
for such a pivot role in the EDI system.
The first section gives an overview of the various syntaxes that have been defined in
the past for exchanging data.
A pivot-oriented approach, and its benefits, are explained in the following section.
In the final section, three EDI projects are presented, where XML has been used as a
pivot.

Several conventions have been defined for structuring electronic information in the
past. Listed below are some well-known abstract and concrete data structuring mechanisms,
and their respective application targets.
- ASN.1 (Abstract Syntax Notation One) and BER (Basic Encoding Rules)
ASN.1 is a
common platform and a language-independent way of defining abstract data structures. The
associated BER specify the concrete representation of an ASN.1 structure when it transits
between computers.
ASN.1 is very useful for documenting structures at a conceptual design level,
especially when working at network level. However, the only meta-information carried in
the concrete syntax is for the identification of low-level data types. Therefore, the
identification of the data semantics relies on an implicit knowledge of the structure.
- RPC (Remote Procedure Call)
The various implementations of RPCs (eg by SUN,
DCE, and Microsoft) address the need of software developers to call a function or
procedure that resides on another machine. The RPC mechanisms include a
platform-independent way to marshal the data (encoding/decoding before and after network
transmissions) but its lack of meta-information makes it only suitable for volatile
low-level inter-process communications.
- CORBA (Common Object Request Broker Architecture) and IIOP (Internet Inter-Object
Request Broker Protocol)
CORBA, like RPC, also targets software integration and the
development of network-enabled applications, but this time in an object-oriented
environment. Like ASN.1, it includes a mechanism for specifying data structures in an
abstract, language-independent way called IDL (Interface Description Language). IDL also
includes an abstract specification of the functional interfaces. As for RPCs, there are
associated marshalling mechanisms (such as the one specified for Internet, IIOP) for
passing data structures between objects. No meta-information about the data semantics is
included in the marshalled data, as CORBA relies on the volatile object interfaces for
addressing such needs.
- CSV (Comma Separated Values)
The simple syntax of CSV has been defined for the
exchange of tabular data, exported from databases or spreadsheets. The first row of the
file, often including the titles of the various columns, can be considered as a
meta-specification.
- EDIFACT (Electronic Data Interchange for Administration, Commerce, and Transport)
EDIFACT aims at specifying standard messages for exchanging electronic data in trade
applications, eg orders and invoices. The syntax is built on top of separators and allows
for messages to be structured as groups of segments that are themselves made up of data
elements.
- SGML (Standard Generalized Markup Language)
This standard was designed to
structure documents. It provides a flexible mechanism that enables the user to model
document structures and to encode them so as to enhance the manipulation, exchange, and
publication of the documents.
- XML (eXtensible Markup Language)
XML, a recently defined subset of SGML,
brings the concepts behind SGML into the popular arena of Internet browsing. The idea
behind XML is to maintain eighty per cent of the benefits of SGML, with only twenty per
cent of its complexity. This enabled the development of easy XML support in tools as
common as Web browsers. XML does not really target the SGML market; its major goal is the
Internet distribution of structured documents. In terms of application target, Microsoft
would also like to position XML as the next-generation CSV format, for interchanging data
between databases, spreadsheets, and a few EDI applications.

Although they are all aimed at defining a structure below the level of file
granularity, it is clear that all of these different specifications and syntaxes target
different needs in terms of application environments.
If the purely technical aspects of the various syntaxes are examined, however, it must
be concluded that SGML/XML allows for more complex modelling.
- ASN.1/BER, RPCs, and CORBA/IIOP mechanisms are difficult to use for
defining complex recurrent structures with optional branches and various lengths of data.
Moreover, as no meta-information about the data semantic is encoded, it would be
impossible to figure out which information is carried by looking only at the data: the
encoding/decoding relies on implicit knowledge of the model. In any case, they are not
really suitable for the archiving of information as they lack the set of tools that could
be expected for achieving it.
- CSV has the advantage of being extremely simple and is used everywhere. But it is
limited to the export of data that can be represented as a list of rows.
- EDIFACT syntax is based on four different levels: messages, groups, segments, and
data elements. It is very appropriate for structuring information that has been defined in
the standardized catalogues of messages and segments. However, it is difficult to use
EDIFACT as a generic syntax for modelling arbitrary information, especially when dealing
with structured text. Moreover, EDIFACT lacks a formal meta-language.
- XML includes a formal way of specifying the model through the notion of a DTD. It
offers a high-level of flexibility and extensibility and is particularly appropriate for
modelling trees and graphs (by using links). It is simple to use. It is even possible to
exploit an XML instance without knowing the details of the meta-specification that is the
DTD. Unfortunately, it has very limited support for data types (the next revision of the
standard will address this point).
- SGML includes many features that are not supported by XML. Those features can
legitimately be considered as unnecessarily complex when the EDI data structures are
considered. However, as explained in [Vijghen 97] , many of those additional features are invaluable in a
development environment used for processing the information.
This very straightforward comparison only takes into account the syntax and the
modelling flexibility, of course. Although this comparison is irrelevant when the
difference in application that is targeted by each of them is individually considered, the
comparison is invaluable when considering the best candidate for representing pivot
models.
Indeed, this paper addresses the need for consistent use of a pivot format when
developing EDI applications. With a pivot format in mind, the use of the most flexible and
scalable syntax is fundamental; there, the choice is purely technical and is independent
of the external representation format that may be required by the users in function of the
application field.

When developing EDI applications, one of the key tasks consists of implementing filters
for processing the messages to be exchanged. Such filters aim either at converting the
messages from one representation to another, or at using the information contained in the
message in databases or other external applications.
Our experience demonstrated that using a pivot-oriented approach for developing such
filters proved to be extremely cost-effective. The approach consists of using a single
internal representation of the information, for the implementation of all the filters
applied to EDI messages.
Note that the word `pivot' here has a different meaning from that in the traditional
EDI terminology, where it often designates more restrictively the representation used for
loading and exporting messages to and from databases.
The cost effectiveness of such an approach is justified by the following facts:
- the code can be independent of the actual concrete representation of the information, by
relying only on the pivot model;
- the application can be adapted more easily if the public syntax of a message is
modified;
- the set of features available in the development environment, associated with the pivot
syntax, becomes independent of the end-user's choice of public representation syntax;
- the use of a consistent set of tools and techniques for manipulating the pivot syntax,
not only across filters but also across projects, improves the productivity of the
developers;
- the number of filters that must be developed for implementing all the conversions
between a set of formats is proportional to the number of formats, and does not grow
exponentially as it does in cases where no pivot format is selected.
We, the SGML Technologies Group, have chosen to use SGML/XML and an integrated
development environment based on this technology as the cornerstone for many projects
based on a pivot-oriented approach, including those in the field of EDI.

Experience gained during the development of various EDI applications includes:
- G-EDI, a generic tool for parsing and processing EDIFACT messages, first developed for
implementing an EDI system for the handling of telecommunication bills of a major Belgian
bank;
- CLASET, an EDIFACT and SGML-based system for exchanging nomenclatures in the context of
the European institution for statistics (EUROSTAT);
- EDIDOC, a generic framework for the Electronic Data Interchange of Documents, developed
for the European Space Agency, that has been operational for a few years./LI>
The G-EDI project, aiming at processing the telecommunication bills of a major banking
institution in Belgium, initiated the development of a generic EDIFACT parser. The key
point is that the parser was based on the notion of a pivot format. In practice, the
implementation is based on SGML technology.
Although the SGML tags and syntax did not help as such for this implementation, the
generic coding mechanisms that are part of SGML helped to keep the application independent
from the actual syntax of the message. Indeed, XML offers all the possibilities that are
required for modelling the information contained in EDIFACT, as it allows for the encoding
of the documents with regard to arbitrary complex tree models and, if hyperlinks are
considered, even graph models.
Although the actual EDIFACT syntax of the messages that are transmitted by the
telecommunications company changed four times since the system was put into production,
only the mapping to the generic underlying model had to be reviewed. Owing to the
approach, it has been possible to reduce the application maintenance costs by a factor of
five.

The goal of the CLASET (Classification Information Set Message) project, developed in
the context of the European Programme for the Interchange of Data between Administration
(IDA), is rather ambitious. Take the example of the definition of `secondary school' in
the various European countries. The reader will understand that there is no consistency at
all. But the European Institutions still want to produce accurate statistics on such
matters, across the internal borders. In order to achieve this, complex nomenclatures for
statistics must be defined. CLASET includes the definition messages for exchanging such
nomenclatures.
The CLASET message allows for the exchange of any hierarchical structure, such as
nomenclatures or classifications, and has conceptually been defined as a result of a
Merise model.
Different representations are used, each based on a distinct syntax:
- an EDIFACT representation of the messages is being standardized in official committees;
- n SGML representation is used as an alternative format to which the EDIFACT
representation can be mapped;
-
- HTML representations of the messages can be produced, as generated from the SGML
representation.
The EDIFACT representation is the most official representation of the information
transported by CLASET messages, because of the standardization process.
The SGML representation is recommended to people who are dealing with highly structured
text, because the SGML representation offers more flexibility than EDIFACT for operations
such as the attachment of footnotes or presentation styles to words that are part of the
free text. Such structures can easily be encoded using an SGML mixed content group model,
which has no equivalent in the EDIFACT layering of messages, groups, segments, and data.
The HTML representation is read-only: it enables people to view the messages using a
tool such as a Web browser (used as a local viewer application in this case).
The actual implementation of the CLASET project is based on SGML, used as an internal
pivot. This approach has lead to an application code that is independent of the actual
details of the representation syntaxes and has been a major argument for the
cost-effectiveness of the project. The benefit of such an approach is not due to the SGML
syntax itself, but the tools associated with SGML offer the required flexibility and the
relevant processing features.

The EDIDOC (Electronic Data Interchange for Documents) project covers the design,
implementation, and deployment of a flexible framework for document-oriented EDI at the
European Space Agency (ESA).
The system is in charge of document exchanges occurring in several distinct
applications of the agency:
- working documents exchanged with the delegations;
- calls for tender sent to potential bidders;
- press releases and information notes sent to the press and public (including their WWW
publication).
At the heart of the EDIDOC system, a central server acts as a clearing house, giving a
potential legal value to the documents exchanged by logging them into a robust relational
database.
This server integrates, in a very generic and flexible way, the key concepts needed for
electronic document exchanges:
- document standards (conformance checking and format conversions),
eg SGML, EDIFACT, PDF, ASCII, RTF, and WordPerfect;
- security packages ( information confidentiality and authentication),
eg PGP and MAC MD-5;
- access protocols (including network aspects),
eg WWW, Internet e-mail, X.400, and FTP.
At each of these levels the server makes sure that the documents are delivered in
accordance with the preferences of the recipients: in the right format, with the right
security package, and the right communication protocols. It really plays the role of a
gateway.
The EDIDOC generic envelopes have been defined in XML. They include the details of the
exchange: originator, list of recipients, unique reference, subject, time stamps, document
types and formats, security mechanisms, delivery options, groupware context, remote
management options, error messages, and so on.
The filters that are plugged in the EDIDOC document standards' components are based on
the notion of a pivot format for conversions. Although the use of SGML is not enforced, it
is the best candidate for defining custom pivot formats for structured documents. Indeed,
SGML includes a very consistent and generic way to model the information. Moreover, the
use of SGML as a pivot format can help for the actual implementation of the converters
because some of the SGML features, such as OMITAG, SHORTREF, LINK, and CONCUR, can be used
for the actual implementation of the convertors themselves.
EDIDOC has demonstrated how important and cost-effective it can be to have a system
that uses a pivot format at the heart of an implementation, even when the format is not
being exposed to users or external systems. In the context of EDIDOC, this `pivot'
paradigm was applied not only to the messages themselves but also to all the surroundings
services (communications, security, and workflow), owing to an object oriented approach.
This has given provision for reusability, scalability, and customizing.

This paper demonstrates that where using a common pivot format, SGML/XML is invaluable
for the development and integration of EDI applications. It was illustrated by some
operational experiences in the context of the G-EDI, CLASET, and EDIDOC projects.
- [Vijghen 97]
Philippe Vijghen; Experience of EDI for Documents: The Role of SGML. In Conference
Proceedings of SGML'97 US, December 1997, pp 213-18

Please e-mail your comments to Philippe Vijghen at phv@sgmltech.com
This paper was first published in the Conference Proceedings of SGML/XML'98 Europe,
May 1998, pp 517-22.
© The SGML Technologies Group 1998 |