"Specifies a whitespace separated
list of alternative literal strings that are the possible separators between
a sequence of elements or multiple occurrences of an element."
A separator applies to all members of a group, but this only talks about
elements.
Suggestion: "Specifies a list of alternative separator values for
the group. Each separator value is a DFDL string literal. If there is more
than one separator in the list then the values are separated by white space."
I purposely omitted the point about multiple occurrences; I think it needs
a separate description, unless we think that the tables make it clear enough.
SMH: The wording here is very like that for initiator and terminator. The property type already has said that the strings are DFDL string literals. So I would say:
"Specifies a whitespace separated list of alternative literal strings that are the possible separators for the sequence. Separators occur in the data either before, between or after all occurrences of the elements or groups that are the children of the sequence."
14.2 Description of 'separator' property
"This property can be computed by
way of an expression which returns a string of whitespace separated values.
It is a Schema Definition Error if the expression returns an empty string
The expression must not contain forward references to elements which have
not yet been processed."
The later sentence about expressions
that return an empty string could then be removed - I think it belongs
in this paragraph.
Also, there is a change in the text style midway through the paragraph.
14.2 Description of 'separator' property
"When parsing, the list of values is processed in a greedy manner, meaning it takes all the separators, that is, each of the string literals in the white space separated list, and matches them each against the data. In each case the longest possible match is found. The separator with the longest match as the one that is selected as having been ‘found’, with length-ties being resolved so that the matching separator is selected that is first in the order written in the schema. Once a matching separator is found, no other shorter matches will be subsequently attempted (ie, there is no backtracking to try parsing based on shorter separator matches)."
I don't know what the correct wording
is, but this is not it :-)
This is a very complex piece of logic to describe, but it is fairly central
to the parsing algorithms. If we don't get it right then we will end up
with divergent DFDL implementations. I honestly don't know where or how
we should be describing the delimiter parsing logic - can we discuss on
the next WG call?
SMH: This paragraph is solely
describing how the matching works, not anything else. It is independent
of lengthKind. This wording was agreed under errata 2.70 and is used for
initiator and terminator as well. What specifically is the issue?
MB: It's really unfortunate that there's this ambiguity about length-ties.
But those can come up due to the character class entities. I.e., I can
write separator="%SP;|%SP; %WSP+|%WSP+;" and both those would
match as a separator of a and b in "a | b".
MB: However, I'm not sure the above purple
wording is really needed about length-ties. If a separator longest-matches,
we're done. We don't really care if there are two separator patterns that
are ambiguous and can match the same thing. If they both match the same
'longest' match, then the separator was found.
14.2
Description of 'separator' property
"If a child element uses an escape
scheme, then the escape scheme also applies to any separator."
What does this mean? Can we remove it?
SMH: It means that when unparsing a child element then an occurrence of the separator in the value will be escaped.
14.1 Empty Sequences
Doesn't seem right to have this as the very first sub-section. Can we make
it the last, and move the other sections up by one? Or at least swap it
with 14.2?
SMH: I don't see it makes
much difference where it goes. So on the grounds of spec renumbering I'd
prefer if it stayed where it is.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
Mike Beckerle | OGF DFDL WG Co-Chair | Tresys Technologies
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU