Mike
Some further thoughts from IBM on your
recommendations, after more internal discussion here.
- Preferable to have dfdl:bitOrder as
a separate property rather to handle it via new dfdl:byteOrder enums. Although
new properties pose validation issues for existing schemas, this should
not compromise the language design. DFDL can choose what bitOrder/byteOrder
combinations are supported.
-
- OK with with new dfdl:byteOrder enum
for littleEndianAtomic16Bit
though can we improve the name?
-
- dfdl:encoding has an architected system
for extra encodings so US-ASCII-7-Bit-Packed should be x-US-ASCII-7-Bit-Packed,
and the spec updated to remove specific mention of US-ASCII-7-Bit-Packed.
We discussed proposed new dfdl:lengthKind
'fixedLengthOrTerminated'. A new enum implies that it can be used
in any scenario, so the following need to be specified.
- dfdl:terminator must be set and can
not be empty string or contain ES on its own
-
- If xs:string or xs:hexBinary, can maxLength
facet be used instead of dfdl:length? (Suggest no - this is variable length
data so min/maxLength are for validation only).
-
- Can dfdl:length be an expression? (Suggest
no unless specific use case identified)
-
- Any special rules for emptyValueDelimiterPolicy
and nilValueDelimiterPolicy ?
-
- Use on complex element. Presumably dfdl:length
is first used to extract a 'box' but within that box does parser immediately
scan for the dfdl:terminator or does it descend into the complex type and
parse the children, expecting to either consume all the box or to find
the terminator at the end? (Suggest the latter).
- Use on complex element. Last child can
not be dfdl:lengthKind 'endOfParent'.
-
- Scanning rules: Use of this new dfdl:lengthKind
switches off any in-scope stack of terminating markup in force at that
point. Put another way, when we are scanning for the dfdl:terminator, we
are not looking for any markup from an outer scope.
So there's plenty to think about
with this new dfdl:lengthKind. A good rule for deciding whether a new dfdl:length
or dfdl:occursCountKind should be added is whether it bends some other
part of the spec out of shape. The new dfdl:lengthKind looks ok so far.
However we *think* we have come up with
an alternative model which is simpler than you one you state in the document.
Example for field 'varstr' with max length 100:
<xs:sequence dfdl:terminator="{if
(fn:str-len(varstr) eq 100) then '%ES;' else '%DEL'}" ...>
<xs:element
name="varstr" type="xs:string" dfdl:lengthKind="pattern"
dfdl:pattern="([^\x7F].\x7F)|(.{100})" ... />
</xs:sequence>
Can't put dfdl:terminator with a self-referencing
expression on the element. Might need fn:exists in the dfdl:terminator
expression to handle optionality. Does that work?
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 11/07/2014 13:09 -----
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>,
Cc:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
08/07/2014 13:31
Subject:
Re: [DFDL-WG]
Action 233 (deferred) - "byte order not sufficient..." - draft
document on experience with binary format MIL-STD-2045
Mike
Please find attached IBM's initial comments
to your experience document, as Word comments. We only got as far
as the 3 x required extensions, not looked at the optional usability stuff
in detail yet.
We think we have our collective heads
around the least significant bit ordering concept, but we think the explanation
could be clearer and show the bits on-the-wire. Some debate as to whether
this could be considered some variation of byteOrder but you've obviously
thought this through and concluded a separate property is best. Also should
bit order apply to text reps, given that byteOrder is binary rep only and
any byte ordering variations in encodings are handled as separate encodings
(eg, UTF-16LE and UTF-16BE).
Regarding the US-ASCII-7-Bit-Packed
encoding enum, this was added via erratum previously using the idea of
DFDL-specific named encoding. But we are thinking that this could have
been handled as an x- encoding, rather than specifically adding it to the
spec. And thinking further on that same thread, should byteOrder
be made to work like encoding and allow x- enums, then the new byteOrder
would become a x- enum. The Wikipedia article you cite on Endianness
mentions other byte orders (eg, Middle-Endian, PDP-Endian).
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
24/06/2014 20:27
Subject:
[DFDL-WG] Action
233 (deferred) - "byte order not sufficient..." - draft document
on experience with binary format MIL-STD-2045
Sent by:
dfdl-wg-bounces@ogf.org
I have created an experience document about the "bit
order" issue, which was a deferred action 233, and the subject of
a public comment.
The document is here: http://redmine.ogf.org/dmsf_files/13268.
The public comment item is http://redmine.ogf.org/boards/15/topics/43.
It recommends a new dfdl:bitOrder property, and a new
dfdl:byteOrder enum value, without which it is impossible to model these
data formats. It also recommends several other improvements to DFDL
to facilitate handling these data formats.
The formats in question are a variety of MIL-STD formats which are all
densely packed binary data. These formats are in broad use. MIL-STD-2045
is one part of this family and this particular format specification is
generally available without any restrictions from a US DoD web site (http://assistdocs.com)
so I made this specific format the subject of the document as it illustrates
all the problematic issues.
We have implemented the dfdl:bitOrder property in Daffodil,
and it works with some useful tests now passing.
We have also enhanced our TDML implementation to enable creation of tests
for this feature (and in the process actually found two bugs in the MIL-STD-2045
spec!).
Both the property and this TDML enhancement are described in the document.
The sponsors of the Daffodil project are extremely keen
to get this needed binary support into the DFDL v1.0 standard so as to
have multiple DFDL implementations support it.
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU