Mike

As I said on the WG call, I like the overall proposal and it opens up a large number of extra scenarios for DFDL.

Review comments:

- Title just says Base64

- "literal string" in property types should either be "DFDL string literal" or "String".

- "lineFolded_IMF - layerLengthKind 'boundaryMark' (without a layerBoundaryMark property - not used. Always CRLF) ". I don't think you can do this, because of scoping. If it must always be CRLF, then insist that is set to that. That's what we do with DFDL properties. Eg for binary xs:float we could have assumed lengthUnits='bytes' but we don't - lengthUnits must be in scope and equal 'bytes'. **

- "aisASCIIArmor - layerLengthKind is assumed to be 'boundaryMark' (the property layerLengthKind is ignored) " Same. **

- "A layered sequence has a mandatory layer alignment (analogous to mandatory text alignment). This is 1 byte for all currently specified layer transforms; in the future this may change." and "A layered sequence has a mandatory length unit. This is 1 byte for all currently specified layer transforms; in the future this may change." If you really think this might change, then you need the properties now, otherwise you can't change them, because of scoping. ** (Note one of your examples contains layerLengthUnits='bytes' in a dfdl:format).

A solution to the above 3 comments marked **. Following the precedent of escape schemes, where it made more sense to group the related properties in their own annotation and have a single scoped dfdl:escapeSchemeRef property, why not do the same for layers? a) It neatly side-steps all the issues with scoping, because none of the properties are scoped. b) It avoids mixing layering and standard properties in dfdl:formats (other than one layerRef property) aiding clarity. c) In your examples you have separate dfdl:formats for the layer properties, which is almost certainly how they would be authored by users, and amounts to the same thing.

- Can layer properties appear on group refs? Presumably yes.

- <sequence dfdl:ref="tns:compressed">
<group ref="tns:compressedGroupContents" dfdl:layerLength="{...}" />
</sequence>
Property should be daf:layerLength and should be on sequence.

- Several other occurrences of dfdl:layerXxxx instead of daf:layerXxx

- I think a statement about errors is required. There is one place in the proposal that talks about 'Parse Error' which is not defined. Specifically:
- When parsing and unparsing a layer the DFDL parser will presumably throw Processing Errors.
- Do these get caught in some way at the layer 'boundary' or do they carry on up?
- If they carry on up, then when parsing and the parser is inside a point of uncertainty, then backtracking will occur.

- The Quoted printable example assumes that layerLengthKind 'pattern' exists along with layerLengthPattern. So may as well be up front with this.

Regards

Steve Hanson

IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: dfdl-wg@ogf.org
Date: 15/05/2018 17:48
Subject: [DFDL-WG] Action 304: Data Streaming (was) Re: Review/Feedback wanted on proposal: Data Streaming for base64 and other Layered Transformations
Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org>

The data layering feature previously proposed has been implemented in Daffodil.

There were minor changes from prior Wiki description. The wiki page is updated to reflect what the implementation actually does. The examples on the wiki page actually run and have been incorporated into Daffodil regression tests.

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+base64

This feature is also being successfully used to create iCalendar and IMF format DFDL schemas. Those are not yet released for public consumption, but the feature is being successfully used for "real" formats and is thusfar, working as designed.

Of the changes since prior draft of the design note, We changed a property value name. Property dfdl:layerLengthKind='boundaryMark' was done to replace the 'terminator' property value to avoid confusion with ordinary dfdl:terminator property.

Thusfar we've not run into a need for a more general mechanism for passing parameters from the schema to the layering. The boundary mark, and the length have been sufficient, but this feels like an area where something more general may be needed in the future. For example, there are really 3 different kinds of layerings all known as "base64". Rather than having one general "base64" with a parameter, we're currently requiring that each variant have its own layering transform name, e.g., "base64_MIME" is one of them (the only one implemented thusfar).

...mikeb

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy

-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU