I can easily support this proposal.
The only "downside" of restricting sequences from having
lengthKind is that you will have to introduce a named "tier" of an element in
your logical DFDL model when you need to model what is really only a physical
implementation concept, of a box surrounding data.
But I've always been in favor of avoiding the slippery
slope where a DFDL schema is supposed to both describe the represenatation as
well as exhibiting a logical model that someone likes. To me the DFDL
schema is very constrained by what is required by the physical model, and
transformations outside of the DFDL schema are going to "tidy up" the model and
remove artifacts of representation. The element tier you need in this case to
express the length of a "box" surrounding content is such a representation
artifact.
To me the slippery slope of allowing sequences to be almost
like unnamed elements is a slope we very much want to avoid, and as you point
out, we're already doing that by eliminating occurs behavior on them for DFDL.
Thought: if you are going to need to transform the
DFDL-described data anyway, then introducing named tiers for these kinds of
complex physical features actually makes such transformation easier to express,
because the xpath expressions to address various parts of the DFDL-described
data are more obvious.
...mike
Mike Beckerle |
OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel:
781-810-2125 | 100 Fifth
Ave., 4th Floor, Waltham MA 02451 | mbeckerle.dfdl@gmail.com
I have a long-standing concern about
the usability of dfdl:lengthKind, which others in IBM are encountering when
modeling real life formats such as EDI.
My main concern is below. For example, what's the
semantic of setting different values on the element and the sequence?
<xs:element
name="container" dfdl:lengthKind="implicit">
<xs:complexType>
<xs:sequence dfdl:separator="@" dfdl:lengthKind="implicit">
<xs:element name="one" type="xs:string" dfdl:lengthKind="delimited"
/>
<xs:element name="two" type="xs:string" dfdl:lengthKind="delimited"
/>
<xs:element name="three" type="xs:string" dfdl:lengthKind="delimited"
/>
</xs:sequence>
</xs:complexType>
</xs:element>
It gets
even more noticeable if I set a scoping dfdl:lengthKind on the complex
type.
I propose that we limit
dfdl:lengthKind to elements only. It means that the length of a
xs:sequence or xs:choice is always and implicitly given by its chidren, and if
you want to provide an explicit length or a length prefix you must use a complex
element to wrap the sequence or choice. We have looked at the implications on
dfdl:choiceKind for choices, and dfdl:occursKind on arrays, and the proposal
works happily in those scenarios.
There's an analogy here with not alowing sequences and choices to repeat,
only elements.
It also simplifies
the grammar, in the sense that any excess fill characters in a 'box' are always
considered part of the element when parsing.
I'd like to discuss this on today's call.
Regards
Steve Hanson
Programming
Model Architect
WebSphere Message Brokers
Hursley, UK
Internet:
smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Unless stated otherwise above:
IBM United
Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU