Layering in DFDL (for our FAQ and docs)

I had action items from GGF14 to begin building up an FAQ. Here's something about layering. ------------------------- Layering in DFDL The book "ASN.1 Complete" by Larmouth (ISBN 0-12-233435-3 and available online as a pdf) discusses the importance of layer support in format descriptions. (Begin Excerpt) The layering concept is perhaps most commonly associated with the International Standards Organization (ISO) and International Telecommunications Union (ITU) "architecture" or "7-layer model" for Open Systems Interconnection (OSI) shown in Figure 3. While many of the protocols developed within this framework are not greatly used today, it remains an interesting academic study for approaches to protocol specification. In the original OSI concept in the late 1970s, there would be just 6 layers providing (progressively richer) carrier services, with a final "application layer" where each specification supported a single endapplication, with no "holes". It became apparent, however, over the next decade, that even in the "application layer" people wanted to leave "holes" in their specification for later extensions, or to provide a means of tailoring their protocol to specific needs. For example, one of the more recent and important protocols - Secure Electronic Transactions (SET) - contains a wealth of fully-defined message semantics, but also provides for a number of "holes" which can transfer "merchant details" which are not specified in the SET specification itself. So we have basic messages for purchase requests and responses, inquiry requests and responses, authorization requests and responses, and so on, but within those messages there are ?holes? for ?message extensions? - additional information specific to a particular merchant. It is thus important that any mechanism or notation for specifying a protocol should be able to cater well for the inclusion of "holes". This has been one of the more important developments in ASN.1 in the last decade, and will be a subject of much further discussion in this book. "Catering well" for the inclusion of "holes" implies that the notation must have defined mechanisms (preferably uniformly applied to all specifications written using that notation) to identify the contents of a hole at communications time. (In lower layers, this is sometimes referred to as the "protocol id" problem). Equally important, however, are notational means to clearly identify that a specification is incomplete (contains a hole), together with well-defined mechanisms to relate the (perhaps later in time) specification of the contents of holes to the location of the holes themselves. (End Excerpt) The argument made here is equally true for DFDL. We need the ability to describe a data format containing a hole or payload which another DFDL schema can then describe the format of. DFDL actually encounters some data formats where there are discontiguous holes. Consider the nonVSAM VS format. (see "IBM OS/390 DFSMS: Using Data Sets" IBM publication SC26-7339-01, Second Edition, December 2000. (online at: http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DGT1D411/CCONTEN TS?SHELF=EZ239126&DN=SC26-7339-01&DT=20001014144419) In this format, data records with actual data fields of interest are broken up into segments. The segments are of variable size, and a record can fit in a single segment or can span multiple segments. The segments are of 3 types, initial, middle (there can be zero or more of these), and final. The hole that the record fits in is assembled by putting together the partial holes from each of the segments. Adding minor additional complexity is that in the actual format the segments are then grouped into variable-sized blocks as an I/O transfer-unit efficiency optimization. A further wrinkle on holes in DFDL is the notion of encoding. Modern data formats often contain holes (or we'll also call them payloads) which have been encoded to allow data transfer in text-only mediums, or to compress to save space, or to encrypt, or for various other reasons. The encoding must be decoded and the resulting data is the payload where we then want to describe the format. There are many examples of this, but email messages using MIME encapsulated attachments are a classic example. We'd like to describe a file of email messages each containing MIME encapsulated attachments where the attachments are compressed binary data where the data is a binary data format. We'd like to describe this file and expose the logical structure of the data that is inside the MIME encapsulated attachments.
participants (1)
-
Mike Beckerle