The current version of the specification
( v0.36) does not clearly specify how an element which has a specified
length should be parsed.
- Section 14.3, when describing dfd:length
says "Only used when lengthKind is
’explicit’ "
- The precedence rules say that when
lengthKind="delimited", no other properties are consulted
- Section 17.3.2 has a comment saying
that it is incorrect. The comment contains a couple of rather ambiguous
statements about what the behaviour should be.
Alan proposes that the behaviour should
be as follows:
- When dfdlLength has a value, the length
of the field must always conform to that value.
- When there is terminating markup in
scope ( terminators or separators ) the parser always uses them.
- If a text field has a defined dfdl:length
AND there is terminating markup in scope, then the parser should first
scan to find the actual length, then check the actual length against dfdl:length
and raise a processing error if they do not match.
I favour the following alternative rules
- dfdl:lengthKind always determines
the method that the parser will use to the find the length of the element
- if lengthKind='explicit' or 'implicit'
or 'prefixed' then the length is extracted without scanning.
- if lengthKind='delimited' then the
length is extracted by scanning and no check is performed against dfdl:length
The alternative rules have the following
advantages:
- they provide a way of switching off
scanning within the scope of a delimited structure. The proposed rules
do not.
- they are easier to implement ( parser
doesn't have to keep track of whether there is any terminating markup in
scope - lengthKind always provides the rule )
- they are slightly easier to explain
to users for the same reason
They do have the following drawbacks:
- dfdl:length is completely ignored
when lengthKind='delimited'. It is not even used to validate the extracted
length. Some users might not like this.
- there are known scenarios ( e.g. SWIFT
52B ) where it is necessary to check the length of a delimited field in
order to choose the correct branch of a choice. Checking dfdl:length would
make it easy to do that.
re: the ignoring of dfdl:length, we
*could* make a rule that the length is checked after the delimited scan
has been performed. But then it would be necessary to ensure that dfdl:length
was un-set for the far more usual case where the length is not important.
I think the control of backtracking
in the 52B scenario is an edge case. In most cases where delimited fields
have a known length we can safely leave the length checking to the schema
validator, or perhaps to a more functional complex validation layer. For
52B, the user will have to create a dfdl:assert to trigger the required
processing error when the length is incorect.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU