Layering in DFDL (for our FAQ and docs)

12 Jul 2005

      I had action items from GGF14 to begin building up an FAQ. Here's 
something about layering.

-------------------------

Layering in DFDL 

The book "ASN.1 Complete" by Larmouth (ISBN 0-12-233435-3 and available 
 online as a pdf) discusses the importance of layer support in format 
descriptions. 

(Begin Excerpt) 

The layering concept is perhaps most commonly associated with the 
International Standards Organization (ISO) and International 
Telecommunications Union (ITU) "architecture" or "7-layer model" 
 for 
Open Systems Interconnection (OSI) shown in Figure 3.  While many of 
the protocols developed within this framework are not greatly used 
today, it remains an interesting academic study for approaches to 
protocol specification. In the original OSI concept in the late 1970s, 
there would be just 6 layers providing (progressively richer) carrier 
services, with a final "application layer" where each specification 
supported a single endapplication, with no "holes". 

It became apparent, however, over the next decade, that even in the 
"application layer" people wanted to leave "holes" in their 
specification for later extensions, or to provide a means of tailoring 
their protocol to specific needs. For example, one of the more recent 
and important protocols - Secure Electronic Transactions (SET) - 
contains a wealth of fully-defined message semantics, but also 
provides for a number of "holes" which can transfer "merchant 
 details" 
which are not specified in the SET specification itself. So we have 
basic messages for purchase requests and responses, inquiry requests 
and responses, authorization requests and responses, and so on, but 
within those messages there are ?holes? for ?message 
extensions? - additional information specific to a particular 
merchant. 

It is thus important that any mechanism or notation for specifying a 
protocol should be able to cater well for the inclusion of 
"holes". This has been one of the more important developments 
 in ASN.1 
in the last decade, and will be a subject of much further discussion 
in this book. 

"Catering well" for the inclusion of "holes" implies that 
 the notation 
must have defined mechanisms (preferably uniformly applied to all 
specifications written using that notation) to identify the contents 
of a hole at communications time. (In lower layers, this is sometimes 
referred to as the "protocol id" problem). Equally important, 
 however, 
are notational means to clearly identify that a specification is 
incomplete (contains a hole), together with well-defined mechanisms to 
relate the (perhaps later in time) specification of the contents of 
holes to the location of the holes themselves. 

(End Excerpt) 

The argument made here is equally true for DFDL. We need the ability to 
describe 
 a data format containing a hole or payload which another DFDL schema can 
then 
 describe the format of. 

DFDL actually encounters some data formats where there are discontiguous 
holes. 
 Consider the nonVSAM VS format. (see "IBM OS/390 DFSMS: Using Data Sets" 
 IBM publication  SC26-7339-01, Second Edition, December 2000. (online 
 at: 
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DGT1D411/CCONTEN 

TS?SHELF=EZ239126&DN=SC26-7339-01&DT=20001014144419) In this format, 
 data records with actual data fields of interest are broken up into 
segments. 
 The segments are of variable size, and a record can fit in a single 
segment 
 or can span multiple segments. The segments are of 3 types, initial, 
middle 
 (there can be zero or more of these), and final. The hole that the record 
fits 
 in is assembled by putting together the partial holes from each of the 
segments. 
 Adding minor additional complexity is that in the actual format the 
segments 
 are then grouped into variable-sized blocks as an I/O transfer-unit 
efficiency 
 optimization. 

A further wrinkle on holes in DFDL is the notion of encoding. Modern data 
formats 
 often contain holes (or we'll also call them payloads) which have been 
encoded 
 to allow data transfer in text-only mediums, or to compress to save 
space, 
 or to encrypt, or for various other reasons. The encoding must be decoded 
and 
 the resulting data is the payload where we then want to describe the 
format. 
 There are many examples of this, but email messages using MIME 
encapsulated 
 attachments are a classic example. We'd like to describe a file of email 
messages 
 each containing MIME encapsulated attachments where the attachments are 
compressed 
 binary data where the data is a binary data format. We'd like to describe 
this 
 file and expose the logical structure of the data that is inside the MIME 

 encapsulated attachments.

Mike Beckerle

tags

participants (1)