The current version of the specification ( v0.36) does not clearly specify how an element which has a specified length should be parsed.
- Section 14.3, when describing dfd:length says "Only used when lengthKind is ’explicit’ "
- The precedence rules say that when lengthKind="delimited", no other properties are consulted
- Section 17.3.2 has a comment saying that it is incorrect. The comment contains a couple of rather ambiguous statements about what the behaviour should be.

Alan proposes that the behaviour should be as follows:
- When dfdlLength has a value, the length of the field must always conform to that value.
- When there is terminating markup in scope ( terminators or separators ) the parser always uses them.
- If a text field has a defined dfdl:length AND there is terminating markup in scope, then the parser should first scan to find the actual length, then check the actual length against dfdl:length and raise a processing error if they do not match.

I favour the following alternative rules
- dfdl:lengthKind always determines the method that the parser will use to the find the length of the element
- if lengthKind='explicit' or 'implicit' or 'prefixed' then the length is extracted without scanning.
- if lengthKind='delimited' then the length is extracted by scanning and no check is performed against dfdl:length

The alternative rules have the following advantages:
- they provide a way of switching off scanning within the scope of a delimited structure. The proposed rules do not.
- they are easier to implement ( parser doesn't have to keep track of whether there is any terminating markup in scope - lengthKind always provides the rule )
- they are slightly easier to explain to users for the same reason

They do have the following drawbacks:
- dfdl:length is completely ignored when lengthKind='delimited'. It is not even used to validate the extracted length. Some users might not like this.
- there are known scenarios ( e.g. SWIFT 52B ) where it is necessary to check the length of a delimited field in order to choose the correct branch of a choice. Checking dfdl:length would make it easy to do that.

re: the ignoring of dfdl:length, we *could* make a rule that the length is checked after the delimited scan has been performed. But then it would be necessary to ensure that dfdl:length was un-set for the far more usual case where the length is not important.
I think the control of backtracking in the 52B scenario is an edge case. In most cases where delimited fields have a known length we can safely leave the length checking to the schema validator, or perhaps to a more functional complex validation layer. For 52B, the user will have to create a dfdl:assert to trigger the required processing error when the length is incorect.

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert@uk.ibm.com
Tel. 01962-816742  
Internal tel. 246742






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU