Tim
Thanks for your comments on the choices
section.
The dfdl:choiceKind property is intended
to be used when the space occupied by the choice is implicitly defined
by the children but the space occupied must always be that of the longest
branch. The primary use case is, as you say, COBOL REDEFINES and
C Unions, where a compiler is allocating memory for the language 'choice'
construct.
The main issues you have highlighted
are:
a) The calculation of the length of
the longest branch.
b) The length units to use - the dfdl:lengthUnits
property does not exist on a choice
c) The name could be better
Let's have a look at a COBOL example.
01 DATA.
05 HEADER
PIC X(10)
05 BODY
PIC X(10).
05 DETAIL REDEFINES BODY.
10 KEY
PIC X(3).
10 CONTENT
PIC X(7).
05 TRAILER
PIC X(10)
What we would like to see for the logical
structure, to preserve the COBOL naming hierarchy into the DFDL infoset,
is:
<xs:element name="DATA">
<xs:complexType>
<xs:sequence>
<xs:element
name="HEADER" type="xs:string"/>
<xs:choice>
<xs:element
name="BODY" type="xs:string"/>
<xs:element
name="DETAIL"/>
<xs:complexType>
<xs:sequence>
<xs:element name="KEY" type="xs:string"/>
<xs:element name="CONTENT" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
<xs:element
name="TRAILER" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
You suggested wrapping the xs:choice
in an xs:element to carry a length computed by COBOL -> DFDL tooling,
or by the user manually. The problem is that forces the introduction of
an extra level and name into the infoset, which does not match the COBOL.
Users will not expect that. Further, existing IBM COBOL -> XSD
tooling creates the above logical structure with no wrapping, so any wrapping
will not be compatible. For users without COBOL -> DFDL tooling, you
are forcing them to compute the length manually. I don't think your suggestion
will work.
When the xs:choice is included directly
in a xs:sequence or another xs:choice, there is no dfdl:lengthKind and
no dfdl:lengthUnits, because we no longer have those properties on xs:choice,
they are only on xs:element. We can't solve this by wrapping in an
element, as just shown, so the solution is to decouple dfdl:choiceKind
from its parent altogether.
You are correct in pointing out that
the length calculation is not always easy. That can be alleviated
by restricting the cases when dfdl:choiceKind='fixedLength' is allowed.
Any violation is detected at static validation time and a schema definition
error results. These can, and likely will, be very restrictive as we are
supporting a specific use case here.
We can debate the name/enums for the
property. For example, dfdl:choicePadKind='none'/'longest' or dfdl:choicePadToLongest='yes'/'no'
conveys the semantic to me.
My proposal is therefore to retain the
property but to:
i) State the conditions that must apply
to use this property, and enforce them in the validator => schema definition
error otherwise
ii) Decouple the choice from its parent
by calculating the length of each branch based solely on the properties
of the branches components, irrespective of any parent dfdl:lengthKind
iii) Choose a better name for the property
Regards
Steve Hanson
Programming Model Architect, WebSphere Message Broker,
Co-Chair, OGF DFDL WG
Hursley, UK,
Internet: smh@uk.ibm.com,
Phone (+44)/(0) 1962-815848
From:
| Tim Kimber/UK/IBM@IBMGB
|
To:
| dfdl-wg@ogf.org
|
Date:
| 04/03/2010 11:31
|
Subject:
| Re: [DFDL-WG] dfdl-wg Digest, Vol 43,
Issue 2
|
Sent by:
| dfdl-wg-bounces@ogf.org |
I have a bunch of questions/issues relating to dfdl:choiceKind. I'm not
asking for changes in v0.40, but I expect there will be changes required.
The issues that I want to raise are:
a) The description of the property in v0.39 contains several typos and
inaccuracies.
- 'implicit' is being used where 'fixedLength' was intended.
- nothing is said about the units in
which the length is calculated.
- there's no need to discuss how the choice is resolved when discussing
the 'variableLength' enum
- we should use standard phraseology when indicating whether a property
can be computed from a DFDL expression.
b) Property name should be 'choiceLengthKind' to accurately reflect its
meaning
c) There is a need for a related property 'choiceLengthUnits'
Consider the recursive algorithm for calculating the length of each branch.
It needs to know whether it is calculating a length in bytes or characters.
If the length is in bytes, then the length cannot be calculated for variable-width
encodings. If the length is in characters, then the length cannot always
be calculated reliably if there are raw byte values in the markup.
d) The rules for calculating the max length of the choice are not provided.
They are complex, and not at all obvious. Consider these issues:
The length of a branch cannot be calculated if
- there are any optional elements or variable-length arrays anywhere in
the branch
- any field in the branch has dfdl:alignment > "1" ( at least,
I can't work out what the rules would be. The alignment of the parent element
would need to be factored in )
- any element or group in the branch specifies its initiator, terminator
or separator as a DFDL expression
- any element or group in the branch specifies its length as a DFDL expression
if choiceLengthUnits='characters' then the length cannot be calculated
if
- any element or group in the branch specifies a DFDL string literal containing
DFDL mnemonics %NL; %WSP*; or %WSP+;
- any element or group in the branch uses a DFDL string literal that contains
sequence of raw byte values with length different from the fixed character
width
if choiceLengthUnits='characters' then the length cannot be calculated
if
- any element in the branch specifies a variable-width encoding, or specifies
its encoding as a DFDL expression.
There are probably other rules which need to be applied, but the above
should illustrate the point. Calculating the length is only possible under
some *very* restrictive conditions.
e) I think the property may not be required
As far as I am aware, this property was introduced to provide support for
COBOL REDEFINES, and to allow MRM message sets to be migrated to DFDL.
If true, the problem gets a lot simpler:
- COBOL does not use initiators/terminators.
- The COBOL compiler contains code that calculates the length of the structure
( it must, because COBOL has a rule that a REDEFINES cannot be longer than
the record that it is redefining ).
Presumably, it takes alignment into account in some way, and handles issues
relating to character width as well.
- COBOL does not allow an anonymous REDEFINES. If imported, A REDEFINES
will always produce a complex element whose content is a fixed-length choice.
Note : This means that the same will be true of any MRM message set created
by message broker's COBOLimporter.
If those assumptions are correct, then in all cases the same effect could
be achieved by putting the precalculated length of the REDEFINES onto the
parent element. I think this merits serious consideration. The cost of
implementing choiceKind='fixedLength' is quite high because of the complexity
of the rules, and the fact that groups, as well as complex elements, can
have a fixed length. But it's not really an implementation issue, it's
a complexity issue. DFDL should not contain a propery with such complex
implementation requirements unless there's a strong case for it - otherwise
potential implementers are going to be put off.
The existing COBOL importer probably does not set the precalculated length
of a REDEFINES on the parent element. That would be required if we wanted
to remove the property - so we would have to discuss that with the group
that provides the importer technology.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU