Mike
As I said on the WG call, I like the
overall proposal and it opens up a large number of extra scenarios for
DFDL.
Review comments:
- Title just says Base64
- "literal
string" in property types should
either be "DFDL string literal" or "String".
- "lineFolded_IMF
- layerLengthKind 'boundaryMark' (without a layerBoundaryMark property
- not used. Always CRLF) ".
I don't think you can do this, because of scoping. If it must always
be CRLF, then insist that is set to that. That's what we do with
DFDL properties. Eg for binary xs:float we could have assumed lengthUnits='bytes'
but we don't - lengthUnits must be in scope and equal 'bytes'. **
- "aisASCIIArmor
- layerLengthKind is assumed to be 'boundaryMark' (the property layerLengthKind
is ignored) " Same. **
- "A layered sequence has a mandatory layer alignment
(analogous to mandatory text alignment). This is 1 byte for all currently
specified layer transforms; in the future this may change." and "A
layered sequence has a mandatory length unit. This is 1 byte for all currently
specified layer transforms; in the future this may change." If
you really think this might change, then you need the properties now, otherwise
you can't change them, because of scoping. ** (Note one of your examples
contains layerLengthUnits='bytes'
in a dfdl:format).
A solution to the above 3 comments
marked **. Following the precedent of escape schemes, where it made more
sense to group the related properties in their own annotation and have
a single scoped dfdl:escapeSchemeRef property, why not do the same
for layers? a) It neatly side-steps all the issues with scoping, because
none of the properties are scoped. b) It avoids mixing layering and standard
properties in dfdl:formats (other than one layerRef property) aiding clarity.
c) In your examples you have separate dfdl:formats for the layer properties,
which is almost certainly how they would be authored by users, and amounts
to the same thing.
- Can layer properties appear on group refs?
Presumably yes.
- <sequence
dfdl:ref="tns:compressed">
<group ref="tns:compressedGroupContents"
dfdl:layerLength="{...}"
/>
</sequence>
Property should be daf:layerLength and should
be on sequence.
- Several other occurrences of dfdl:layerXxxx
instead of daf:layerXxx
- I think a statement about errors is required.
There is one place in the proposal that talks about 'Parse Error' which
is not defined. Specifically:
- When
parsing and unparsing a layer the DFDL parser will presumably throw Processing
Errors.
- Do
these get caught in some way at the layer 'boundary' or do they carry on
up?
- If
they carry on up, then when parsing and the parser is inside a point of
uncertainty, then backtracking will occur.
- The Quoted printable example assumes that
layerLengthKind 'pattern' exists along with layerLengthPattern. So may
as well be up front with this.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org
Date:
15/05/2018 17:48
Subject:
[DFDL-WG] Action
304: Data Streaming (was) Re: Review/Feedback wanted on proposal: Data
Streaming for base64 and other Layered Transformations
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
The data layering feature previously proposed has been
implemented in Daffodil.
There were minor changes from prior Wiki description.
The wiki page is updated to reflect what the implementation actually does.
The examples on the wiki page actually run and have been incorporated into
Daffodil regression tests.
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+base64
This feature is also being successfully used to create
iCalendar and IMF format DFDL schemas. Those are not yet released for public
consumption, but the feature is being successfully used for "real"
formats and is thusfar, working as designed.
Of the changes since prior draft of the design note, We
changed a property value name. Property dfdl:layerLengthKind='boundaryMark'
was done to replace the 'terminator' property value to avoid confusion
with ordinary dfdl:terminator property.
Thusfar we've not run into a need for a more general mechanism
for passing parameters from the schema to the layering. The boundary mark,
and the length have been sufficient, but this feels like an area where
something more general may be needed in the future. For example, there
are really 3 different kinds of layerings all known as "base64".
Rather than having one general "base64" with a parameter, we're
currently requiring that each variant have its own layering transform name,
e.g., "base64_MIME" is one of them (the only one implemented
thusfar).
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU