A small correction:

the parsing rules I propose, and I think what is currently in the spec, are

- for fixed length 'text' elements (lengthKind is 'implicit' or 'explicit') that also has terminating markup (terminator or in-scope separator or terminator) then the parser should scan for the markup then check the length
- for fixed length 'text' elements (lengthKind is 'implicit' or 'explicit') with no terminating markup the length is used

- for fixed length 'binary' fields, which are not scannable, with terminating markup then the length should be used to extract the field then scan for markup. (I'm not sure this is a realistic scenario but it is allowed.)
- for fixed length 'binary' fields without terminating markup then the length should be used

- for fixed length complex elements with terminating markup each child is treated as above. When the end of the complex element is found it is compared to the fixed length
- for fixed length complex elements without terminating markup the length is used to extract the element and that 'buffer' is parsed for the children.

- I was not suggesting that dfdl:length should be examined for any lengthKind other than explicit

Notes:
Because lengthKind explicit is used to specify a fixed length or a reference to a length field it isn't possible we have to treat them the same way even. However if the found length doesn't match the 'fixed' length it should be a processing error and cause backtracking but if the reference length doesn't match it should be a hard error. Perhaps we need a way to distinguish between these cases.

There needs to be similar rules for the other lengthKinds, eg prefixed, with terminating markup.

I will put this on the agenda for this weeks call

Alan Powell

MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898

From:	Tim Kimber/UK/IBM@IBMGB
To:	dfdl-wg@ogf.org
Date:	17/11/2009 01:42
Subject:	[DFDL-WG] How to determine the length of an element which has text representation

The current version of the specification ( v0.36) does not clearly specify how an element which has a specified length should be parsed.
- Section 14.3, when describing dfd:length says "Only used when lengthKind is ’explicit’ "
- The precedence rules say that when lengthKind="delimited", no other properties are consulted
- Section 17.3.2 has a comment saying that it is incorrect. The comment contains a couple of rather ambiguous statements about what the behaviour should be.

Alan proposes that the behaviour should be as follows:
- When dfdlLength has a value, the length of the field must always conform to that value.
- When there is terminating markup in scope ( terminators or separators ) the parser always uses them.
- If a text field has a defined dfdl:length AND there is terminating markup in scope, then the parser should first scan to find the actual length, then check the actual length against dfdl:length and raise a processing error if they do not match.

I favour the following alternative rules
- dfdl:lengthKind always determines the method that the parser will use to the find the length of the element
- if lengthKind='explicit' or 'implicit' or 'prefixed' then the length is extracted without scanning.
- if lengthKind='delimited' then the length is extracted by scanning and no check is performed against dfdl:length

The alternative rules have the following advantages:
- they provide a way of switching off scanning within the scope of a delimited structure. The proposed rules do not.
- they are easier to implement ( parser doesn't have to keep track of whether there is any terminating markup in scope - lengthKind always provides the rule )
- they are slightly easier to explain to users for the same reason

They do have the following drawbacks:
- dfdl:length is completely ignored when lengthKind='delimited'. It is not even used to validate the extracted length. Some users might not like this.
- there are known scenarios ( e.g. SWIFT 52B ) where it is necessary to check the length of a delimited field in order to choose the correct branch of a choice. Checking dfdl:length would make it easy to do that.

re: the ignoring of dfdl:length, we *could* make a rule that the length is checked after the delimited scan has been performed. But then it would be necessary to ensure that dfdl:length was un-set for the far more usual case where the length is not important.
I think the control of backtracking in the 52B scenario is an edge case. In most cases where delimited fields have a known length we can safely leave the length checking to the schema validator, or perhaps to a more functional complex validation layer. For 52B, the user will have to create a dfdl:assert to trigger the required processing error when the length is incorect.

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU