I'd like to record what was discussed
and raise another point which Alan pointed out after meeting,
Discussions in the meeting
- dfdl:lengthKind applies only to the
element on which it is specified. It has no effect whatever on the parsing
of child elements/groups.
- there may be some value in tolerating
simple elements of type xs:string with dfdl:representation="binary".
Might be useful for schemas where dfdl:representation="binary"
throughout.
- Currently, the position of the WG
is that parsers should *always* scan to extract the text representation
if there is any terminating markup in scope. Even if lengthKind='explicit'.
- TK proposed the scheme outlined in
his previous email, in which dfdl:lengthKind alone specifies how the parser
should extract the text representation.
If lengthKind="explicit",
scanning is switched off and dfdl:length is used. If lengthKind="delimited"
the text rep is extracted by scanning and length is ignored.
- A refinement was discussed whereby
dfdl:length would be checked after a scan has been performed if dfdl:lengthKind="delimited".
This would make the modeling of some common formats simpler, and avoid
the need for a dfdl:assert to enforce the length constraint.
- MB raised the possibility that we
could actually disallow dfdl:length if lengthKind='delimited'. This is
the most conservative position, but general opinion was that it would be
too restrictive. There still might be some value in disallowing dfdl:length
for other lengthKinds.
Discussions after the meeting
- Alan pointed out that lengthKind="explicit"
does not necessarily mean that the length of the field is fixed. dfdl:length
might be specified as a DFDL expression. A common reason for doing that
would be to obtain the element's length from an earlier integer field.
As currently specified, if there was any markup in scope, the text rep
would be extracted by scanning.
Restatement of my position after today's
meeting:
I'm now even more convinced that dfdl:lengthKind="explicit"
should switch off scanning. Here's why:
a) The enumerations of lengthKind are
explicit, implicit, prefixed, delimited, pattern, endOfParent.
The presence of 'delimited' in that list means that in some users' minds,
the other enumerations are going to be interpreted as *alternatives* to
'delimited'.
b) If there's markup in scope, scanning
cannot be switched off by any means. Not even by setting lengthKind='explicit'
AND obtaining dfdl:length from a previous integer field. I think that's
very counter-intuitive.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU