Re: [DFDL-WG] Action 304: Data Streaming (was) Re: Review/Feedback wanted on proposal: Data Streaming for base64 and other Layered Transformations

16 May 2018

      Mike

As I said on the WG call, I like the overall proposal and it opens up a 
large number of extra scenarios for DFDL. 

Review comments:

- Title just says Base64

- "literal string" in property types should either be "DFDL string 
literal" or "String".

- "lineFolded_IMF - layerLengthKind 'boundaryMark' (without a 
layerBoundaryMark property - not used. Always CRLF) ".  I don't think you 
can do this, because of scoping. If it must always be CRLF, then insist 
that is set to that.  That's what we do with DFDL properties. Eg  for 
binary xs:float we could have assumed lengthUnits='bytes' but we don't - 
lengthUnits must be in scope and equal 'bytes'. **

- "aisASCIIArmor - layerLengthKind is assumed to be 'boundaryMark' (the 
property layerLengthKind is ignored) " Same. **

- "A layered sequence has a mandatory layer alignment (analogous to 
mandatory text alignment). This is 1 byte for all currently specified 
layer transforms; in the future this may change." and "A layered sequence 
has a mandatory length unit. This is 1 byte for all currently specified 
layer transforms; in the future this may change."  If you really think 
this might change, then you need the properties now, otherwise you can't 
change them, because of scoping. **  (Note one of your examples contains 
layerLengthUnits='bytes' in a dfdl:format).

A solution to the above 3 comments marked **. Following the precedent of 
escape schemes, where it made more sense to group the related properties 
in their own annotation and have a single scoped dfdl:escapeSchemeRef 
property,  why not do the same for layers? a) It neatly side-steps all the 
issues with scoping, because none of the properties are scoped. b) It 
avoids mixing layering and standard properties in dfdl:formats (other than 
one layerRef property) aiding clarity. c) In your examples you have 
separate dfdl:formats for the layer properties, which is almost certainly 
how they would be authored by users, and amounts to the same thing. 

- Can layer properties appear on group refs? Presumably yes.

- <sequence dfdl:ref="tns:compressed">
  <group ref="tns:compressedGroupContents" dfdl:layerLength="{...}" />
</sequence>
Property should be daf:layerLength and should be on sequence.

- Several other occurrences of dfdl:layerXxxx instead of daf:layerXxx

- I think a statement about errors is required. There is one place in the 
proposal that talks about 'Parse Error' which is not defined. 
Specifically:
        - When parsing and unparsing a layer the DFDL parser will 
presumably throw Processing Errors.
        - Do these get caught in some way at the layer 'boundary' or do 
they carry on up?
        - If they carry on up, then when parsing and the parser is inside 
a point of uncertainty, then backtracking will occur. 

- The Quoted printable example assumes that layerLengthKind 'pattern' 
exists along with layerLengthPattern. So may as well be up front with 
this.

Regards

Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 

From:   Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:     dfdl-wg@ogf.org
Date:   15/05/2018 17:48
Subject:        [DFDL-WG] Action 304: Data Streaming (was) Re: 
Review/Feedback wanted on proposal: Data Streaming for base64 and other 
Layered Transformations
Sent by:        "dfdl-wg" <dfdl-wg-bounces@ogf.org>

The data layering feature previously proposed has been implemented in 
Daffodil. 

There were minor changes from prior Wiki description. The wiki page is 
updated to reflect what the implementation actually does. The examples on 
the wiki page actually run and have been incorporated into Daffodil 
regression tests. 

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layeri...

This feature is also being successfully used to create iCalendar and IMF 
format DFDL schemas. Those are not yet released for public consumption, 
but the feature is being successfully used for "real" formats and is 
thusfar, working as designed.

Of the changes since prior draft of the design note, We changed a property 
value name. Property dfdl:layerLengthKind='boundaryMark' was done to 
replace the 'terminator' property value to avoid confusion with ordinary 
dfdl:terminator property. 

Thusfar we've not run into a need for a more general mechanism for passing 
parameters from the schema to the layering. The boundary mark, and the 
length have been sufficient, but this feels like an area where something 
more general may be needed in the future. For example, there are really 3 
different kinds of layerings all known as "base64". Rather than having one 
general "base64" with a parameter, we're currently requiring that each 
variant have its own layering transform name, e.g., "base64_MIME" is one 
of them (the only one implemented thusfar). 

...mikeb

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy

--
  dfdl-wg mailing list
  dfdl-wg@ogf.org

https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU