
The current version of the specification ( v0.36) does not clearly specify how an element which has a specified length should be parsed. - Section 14.3, when describing dfd:length says "Only used when lengthKind is ’explicit’ " - The precedence rules say that when lengthKind="delimited", no other properties are consulted - Section 17.3.2 has a comment saying that it is incorrect. The comment contains a couple of rather ambiguous statements about what the behaviour should be. Alan proposes that the behaviour should be as follows: - When dfdlLength has a value, the length of the field must always conform to that value. - When there is terminating markup in scope ( terminators or separators ) the parser always uses them. - If a text field has a defined dfdl:length AND there is terminating markup in scope, then the parser should first scan to find the actual length, then check the actual length against dfdl:length and raise a processing error if they do not match. I favour the following alternative rules - dfdl:lengthKind always determines the method that the parser will use to the find the length of the element - if lengthKind='explicit' or 'implicit' or 'prefixed' then the length is extracted without scanning. - if lengthKind='delimited' then the length is extracted by scanning and no check is performed against dfdl:length The alternative rules have the following advantages: - they provide a way of switching off scanning within the scope of a delimited structure. The proposed rules do not. - they are easier to implement ( parser doesn't have to keep track of whether there is any terminating markup in scope - lengthKind always provides the rule ) - they are slightly easier to explain to users for the same reason They do have the following drawbacks: - dfdl:length is completely ignored when lengthKind='delimited'. It is not even used to validate the extracted length. Some users might not like this. - there are known scenarios ( e.g. SWIFT 52B ) where it is necessary to check the length of a delimited field in order to choose the correct branch of a choice. Checking dfdl:length would make it easy to do that. re: the ignoring of dfdl:length, we *could* make a rule that the length is checked after the delimited scan has been performed. But then it would be necessary to ensure that dfdl:length was un-set for the far more usual case where the length is not important. I think the control of backtracking in the 52B scenario is an edge case. In most cases where delimited fields have a known length we can safely leave the length checking to the schema validator, or perhaps to a more functional complex validation layer. For 52B, the user will have to create a dfdl:assert to trigger the required processing error when the length is incorect. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU