can someone draft this?

I've been working on the DFDL spec quite a bit. Steve Hanson suggested that this woudl be good to have in the document before the April OGF. 5. What is DFDL, including a sub-section giving scope of DFDL 1.0. Right now the doc has this content which is pretty much entirely TBD/placeholder. Can someone step up to the plate to revise this? I am working on other aspects than this, i.e., trying to reorganize and straighten out the decomposition of the semantics into the parse function and parse strategies. I really want to focus on this aspect, but I agree with steve that thsi woudl be great to have done in time. ----------------------------------- What is DFDL Version 1.0? Version 1.0 of DFDL is a language capable of expressing a wide array of binary and text-based data formats. DFDL is capable of describing binary data as found in the data structures of Cobol, C, PL1, Fortran, etc. In particular, it is able to describe repeating sub-arrays where the length of an array is stored in another location of the structure. TBD.... DFDL is capable of describing a wide variety of textual data formats. These include TBD:list of examples. TBD: mixtures. Composition properties. I.e., two formats can be nested, concatenated, etc. to create a new working format definition. (limitations here due to regexp?) The following topics have been deferred to future versions of the standard: Extensibility: There are real examples of proprietary data format description languages that we use as our base of experience from which to derive standard DFDL. However, there are no examples of extensible format description languages; hence, while extensibility is desirable in DFDL, there is not yet a base of experience with extensibility from which to derive a standard. Layering: Some formats require data to be described in multiple layers. That is, where one element's contents becomes the representation of another element. DFDL V1.0 allows description of only one layer.

I believe the language below, while needing examples perhaps to clarify, solves the problem of how to deal with nested delimited constructs where outer delimiters are terminating inner nested constructs. I'd like to see if people understand it on our DFDL call on wednesday. ...mikeb Nested Delimited Constructs There are two kinds of terminating delimiters: postfix separators (TBD: currently only on arrays) terminators Both of these can be optional. (TBD: can postfix separators be optional? I believe there is no finalSeparatorCanBeMissing though there is a terminatorCanBeMissing property) This means the parser can encounter a terminating delimiter from an enclosing array or group to indicate the termination of a nested array or group. The behavior of parsing when it encounters one of these terminating delimiters, that is, one that is defined on an enclosing construct, is to indicate that it found a terminating delimiter of some sort, but to not consume that terminating delimiter when computing the new pos result. That is, the parse function succeeds and returns a new pos, value, etc., but the pos reflects the position where the terminating delimiter begins. The invariant that this insures is that a parser for any construct consumes its own delimiters and only its own delimiiters. For example, the parse function for a sequence group which has separators specified will recursively parse the elements it contains; however, if one of those elements' representation is terminated by finding the enclosing sequence group's separator, then that separator will not be consumed, and when the recursive parse unwinds back to the parse function of the enclosing sequence group, the separator will then be consumed by the sequence group's parse function which is prepared to recognize it and advance past it. This principle works regardless of how deeply nested the constructs are. Parsing must, however, take into account the complete set of terminating delimiters that it might encounter, along with the escape/quoting schemes that can be specified for them which allow them to appear as content rather than as delimiters.

Quick edits to improve correctness. ------------------------------------------------------------------------------------------- Nested Delimited Constructs There are two kinds of terminating delimiters: postfix separators (TBD: currently only on arrays) terminators Both of these can be optional. (TBD: can postfix separators be optional? I believe there is no finalSeparatorCanBeMissing though there is a terminatorCanBeMissing property) This means the parser can encounter a separating or terminating delimiter from an enclosing array or group to indicate the termination of a nested array or group. The behavior of parsing when it encounters one of these delimiters, that is, one that is defined on an enclosing construct, is to indicate that it found a delimiter of some sort, but to not consume that delimiter when computing the new pos result. That is, the parse function succeeds and returns a new pos, value, etc., but the pos reflects the position where the delimiter begins. The invariant that this insures is that a parser for any construct consumes its own delimiters and only its own delimiiters. For example, the parse function for a sequence group which has separators specified will recursively parse the elements it contains; however, if one of those elements' representation is terminated by finding the enclosing sequence group's separator, then that separator will not be consumed, and when the recursive parse unwinds back to the parse function of the enclosing sequence group, the separator will then be consumed by the sequence group's parse function which is prepared to recognize it and advance past it. This principle works regardless of how deeply nested the constructs are. Parsing must, however, take into account the complete set of delimiters that it might encounter, along with the escape/quoting schemes that can be specified for them which allow them to appear as content rather than as delimiters. Mike Beckerle/Worcester/IBM@IBMUS Sent by: dfdl-wg-bounces@ogf.org 03/27/2007 08:01 PM To dfdl-wg@ogf.org cc Subject [DFDL-WG] Nested Delimited Constructs - For discussion Wed 3/28 I believe the language below, while needing examples perhaps to clarify, solves the problem of how to deal with nested delimited constructs where outer delimiters are terminating inner nested constructs. I'd like to see if people understand it on our DFDL call on wednesday. ...mikeb Nested Delimited Constructs There are two kinds of terminating delimiters: postfix separators (TBD: currently only on arrays) terminators Both of these can be optional. (TBD: can postfix separators be optional? I believe there is no finalSeparatorCanBeMissing though there is a terminatorCanBeMissing property) This means the parser can encounter a terminating delimiter from an enclosing array or group to indicate the termination of a nested array or group. The behavior of parsing when it encounters one of these terminating delimiters, that is, one that is defined on an enclosing construct, is to indicate that it found a terminating delimiter of some sort, but to not consume that terminating delimiter when computing the new pos result. That is, the parse function succeeds and returns a new pos, value, etc., but the pos reflects the position where the terminating delimiter begins. The invariant that this insures is that a parser for any construct consumes its own delimiters and only its own delimiiters. For example, the parse function for a sequence group which has separators specified will recursively parse the elements it contains; however, if one of those elements' representation is terminated by finding the enclosing sequence group's separator, then that separator will not be consumed, and when the recursive parse unwinds back to the parse function of the enclosing sequence group, the separator will then be consumed by the sequence group's parse function which is prepared to recognize it and advance past it. This principle works regardless of how deeply nested the constructs are. Parsing must, however, take into account the complete set of terminating delimiters that it might encounter, along with the escape/quoting schemes that can be specified for them which allow them to appear as content rather than as delimiters. -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg
participants (1)
-
Mike Beckerle