Mike
Length kind 'prefixed' was intended
to handle the case where the length is tightly bound to the data, ie, there
is nothing between the length and the data. For example a PL/1 var char
or ASN.1 BER. If the length causes the length/data to be aligned
then that has to be taken into account on the element itself. Length
kind 'prefixed' was not intended to cover more complex cases where the
length itself has independent alignment or there are delimiters involved.
For those you use length kind 'explicit' and an expression. Otherwise the
combinations become too complicated. If we wish to extend 'prefixed' to
include the more complex cases, I think that is a post 1.0 thought and
is best handled using a different length kind enum.
You say that ignoring the alignment
property on the simple type used for the length is strange, but if you
allow that there is no way to align the element's actual data separately.
I think that it is even stranger.
The ASN.1 BER description at http://en.wikipedia.org/wiki/Basic_Encoding_Rules
describes how the length itself can have a prefix (see sub-section 'Length').
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
| Mike Beckerle <mbeckerle.dfdl@gmail.com>
|
To:
| Tim Kimber/UK/IBM@IBMGB, dfdl-wg@ogf.org
|
Date:
| 24/10/2011 02:16
|
Subject:
| Re: [DFDL-WG] lengthKind='prefixed'
clarification needed
|
Sent by:
| dfdl-wg-bounces@ogf.org |
Definintely need an agenda slot to discuss this matter.
I think we should redefine PrefixLength to allow it to have framing.
There is a significant issue which is that some prefixLengthTypes will
be multi-byte binary integers (typically 2 or 4 bytes), and these commonly
require alignment to a 2 or 4 byte boundary, as that's how the data structures
they live in would have been laid out.
The spec currently doesn't allow prefixLengthTypes to be aligned themselves,
because the grammar has them as SimpleContent, without the surrounding
ElementLeftFraming and RightFraming. This is also why they cannot have
lengthKind='delimited'. Because there are no initiator nor terminator regions
surrounding them. So the only way they can be aligned is if the elements
that have these prefixLengths are themselves aligned properly.
However, if you specify alignment on a simple integer type, use it as a
prefixLengthType, and then that alignment annotation is *ignored* that
would seem strange, and buggy/hard-to-diagnose.
However, scoping rules for properties don't provide any way for this alignment
to get "into the scope chain", and I'd hate to start messing
with the scoping rules because of the corner case of prefixLength. We'd
need to put another scoping rule in just to handle this. I'd rather not
go there. Lots of our examples in the spec would have to change as they
use alignment as the example property...
But, the spec is not self-consistent, as the dfdl:alignment property can
be placed on a simple type definition, as well as on an element. So it
would seem a prefixLengthType could reference an aligned simple integer
type, but neither the grammar nor the scoping rules allow for using this
alignment property to control anything. Similarly, you can put an
initiator on a simple integer type, use it as a prefixLengthType, and have
the initiator be ignored.... because there is no initiator region for a
PrefixLength.
We need to fix this inconsistency.
I think prefixLengthType needs to be alignable, and one should be able
to specify alignment on a type definition, not just on an element.
I also think we're better off with a uniform general fix here, than a handful
of special case rules around prefix lengths. (E.g., the prefixLengthType
cannot have alignment, cannot have initiator/terminator or lengthKind delimited
warning or SDE if it does, etc. etc.)
So I think the grammar is wrong. I think
PrefixLength = SimpleElement
(where SimpleElement = ElementLeftFraming SimpleContent RightFraming )
is the right definition.
In working through examples, I'm convinced the current spec is problematic.
In the current spec one must model a 4-byte aligned binary integer prefix
length as a separate element (so that you can align it), and use lengthKind='explicit'
on the thing it controls. This is a lot of hassle for a very common situation.
The whole point of dfdl:lengthKind='prefixed' is to provide an easier way
to model the common cases.
For the same reason there is no alignment, the definition of dfdl:prefixLengthType
says the named type cannot have lengthKind='delimited'. That is because
the DFDL grammar defines the prefixLength region to be SimpleContent which
is without any of the surrounding framing regions where delimiters are
found.
So, one cannot for example, put an initiator and terminator on the prefix
length type so as to have syntax separating it from the actual content.
Even if it is fixed length you can't do it - Like you cannot model this
data as 3 string elements using prefix length:
(11)9 Ocean Way(20)Southwest(SW) Harbor(02)ME
(Notice in the above the unescaped "(SW)", which is why this
is not a delimited format.)
You also cannot do:
11(9 Ocean Way)20(Southwest(SW) Harbor)02(ME)
because that puts the initiator of the string element itself after its
prefix length region, which is backwards from the way we have it in the
grammar currently. Both of the examples above require use of a separate
element and lengthKind="explicit" to pull off, even though they
seem like fairly natural ways to textualize a binary format.
Now consider
xx9 Ocean WayxxSouthwest(SW) HarborxxME
where the "xx" is a 16 bit (2 byte) binary integer holding the
lengths 11, 20, and 2 respectively.
Except....That is, so long as the "xx" doesn't need to be on
a 2-byte alignment, because in my example the first element occupies 13
bytes including the prefix itself, so the next "xx" starts on
an odd boundary. I could specify alignment on each of the 3 elements
of my sequence here, which is unmotivated/weird since they're string elements
and their type may be distant from where the elements are declared, so
the motivation for the alighment may not be clear....... the alignment
constraint really wants to be expressed on the prefixLengthType, and the
dfdl annotation syntax lets you specify alignment there, ... it just doesn't
use it.
If we just redefine PrefixLength as SimpleElement, now all the example
formats above are easily modeled in the obvious way, and even the combinations
of text and binary lengths can be done naturally, as a binary prefixLengthType
integer type can have all the usual constraints binary data likes to have,
like alignment.
Even the 2-level ASN.1 wierd case "prefix-length of the prefix-length"
(see errata 2.13) works because ElementLeftFraming itself includes PrefixLength.
I believe we should put an explicit depth limit of 2 on this however.
(Side note: I'd like to see an example of the ASN.1 format that supposedly
requires this nested prefix of the prefix situation.)
Changing the grammar in this way lets us drop the special case handling
around prefixLength where it can't have lengthKind="delimited"
and ignores initiators and terminators and alignment which is a bunch less
special cases to have to implement and test, and create special warnings
for (e.g., "Warning: prefixLengthType 'lenType' has alignment property
which will be ignored.")
If we want to be more minimal about the changes, just changing
PrefixLength = ElementLeftFraming SimpleContent RightFraming
is sufficient and achieves the fix of the real problem.
(This also eliminates the need for current errata 2.13 and 2.14, or rather
replaces those errata with this new stuff.)
...mikeb
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
From:
| Tim Kimber/UK/IBM@IBMGB
|
To:
| Mike Beckerle <mbeckerle.dfdl@gmail.com>
|
Cc:
| dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org
|
Date:
| 23/10/2011 21:12
|
Subject:
| Re: [DFDL-WG] lengthKind='prefixed'
clarification needed
|
Sent by:
| dfdl-wg-bounces@ogf.org |
Hi Mike,
I have always assumed that it works like this:
The Prefix region includes leading alignment, leading skip and initiator
The Content region contains the data, and the lengthKind property describes
how to determine the content length
The Suffix region includes Terminator and trailing alignment.
The lengthKind property describes the content region, and is not examined
until the Content region is reached. So the element's iniitator, if defined,
is not included in the length described by the prefix length.
If you view the prefixed length as describing the length of the *element*
(i..e its entire representation ) then this definition is not intuitive.
But I have always viewed lengthKind='prefixed' as being like the other
lengthKinds - it describes the length of the element's *content*.
So it's a consistent definition, but is it useful? I think so. In my experience,
prefixed lengths tend to be applied to complex elements ( structures )
rather than simple values. In such cases, the content of the complex element
will always be either a sequence group or a choice group, and any initiator/terminator
can be located on that group..
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: dfdl-wg@ogf.org
Date: 22/10/2011
19:12
Subject: [DFDL-WG]
lengthKind='prefixed' clarification needed
Sent by: dfdl-wg-bounces@ogf.org
For agenda/issues list
With respect to lengthKind='prefixed'. I'm concerned that there's a complex
interaction with initiator/terminator.
Can we have a prefix length and an initiator and terminator as well? If
so which comes first, and if it's the prefix does the prefix length include
the length of the initiator and terminator?
The grammar as written in current draft of spec has the initiator first,
then the prefix, then the content, and then the terminator. I think this
is wrong. I mean we can make it work, but it's not a useful, nor intuitive
behavior.
If we're going to fix this, I think we should make prefixed an alternative
to initiator and terminator, so that you can't have both on the element.
The alternative is to change the order around. Because initiator and terminator
can each be lists of alternative choices, the only sensible composition
of prefixed with these has prefix length providing the length of a syntax
which includes static initiator and terminator fields, which are sort of
like static padding to be trimmed off the string before extracting the
value.
E.g., prefix length of 10 preceeding these characters: [[123456]]
<element name='x' type='int' dfdl:initiator="[[" dfdl:terminator="]]"
dfdl:lengthKind='prefixed' .../>
But,....this is obscure enough that I'd rather make prefix length exclusive
of initiator/terminator. I.e. Schema Def Error if both are specified.
Rationale: Even if such formats are possible, and even if they do exist
somewhere, it's possible to model this format differently with hidden fields,
lengthKind='explicit' etc., so it's not like removing this complex interaction
of prefix with initiator/terminator reduces DFDL's expressive power in
any way.
Summary: To reduce complexity, suggest that lengthKind='prefixed' is exclusive
of both initiator and terminator properties directly on the same element.
Schema Definition Error if both are specified.
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU