For discussion on next DFDL WG call.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848

----- Forwarded by Steve Hanson/UK/IBM on 07/01/2013 17:32 -----

From:        Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:        Steve Hanson/UK/IBM@IBMGB,
Cc:        Tim Kimber/UK/IBM@IBMGB
Date:        11/12/2012 17:15
Subject:        Re: Editorial improvements for section 14.2





Some added discussion on top of steve's on 14.2 separator property.


From:        
Tim Kimber/UK/IBM
To:        
mbeckerle.dfdl@gmail.com, Steve Hanson/UK/IBM@IBMGB,
Date:        
10/12/2012 15:14
Subject:        
Editorial improvements for section 14.2




A couple of things that I noticed while looking through the specification today:


14.2        Title

Section title should really be 'Sequence groups with separators'.


SMH: Agree

14.2         Description of 'separator' property

"Specifies a whitespace separated list of alternative literal strings that are the possible separators between a sequence of elements or multiple occurrences of an element."
A separator applies to all members of a group, but this only talks about elements.
Suggestion: "Specifies a list of alternative separator values for the group. Each separator value is a DFDL string literal. If there is more than one separator in the list then the values are separated by white space."

I purposely omitted the point about multiple occurrences; I think it needs a separate description, unless we think that the tables make it clear enough.

SMH: The wording here is very like that for initiator and terminator. The property type already has said that the strings are DFDL string literals. So I would say:

"Specifies a whitespace separated list of alternative literal strings that are the possible separators for the sequence. Separators occur in the data either before, between or after all occurrences of the elements or groups that are the children of the sequence."

14.2         Description of 'separator' property

"This property can be computed by way of an expression which returns a string of whitespace separated values.
It is a Schema Definition Error if the expression returns an empty string

The expression must not contain forward references to elements which have not yet been processed."

The later sentence about expressions that return an empty string could then be removed - I think it belongs in this paragraph.
Also, there is a change in the text style midway through the paragraph.


14.2         Description of 'separator' property

"When parsing, the list of values is processed in a greedy manner, meaning it takes all the separators, that is, each of the string literals in the white space separated list, and matches them each against the data. In each case the longest possible match is found. The separator with the longest match as the one that is selected as having been ‘found’, with length-ties being resolved so that the matching separator is selected that is first in the order written in the schema. Once a matching separator is found, no other shorter matches will be subsequently attempted (ie, there is no backtracking to try parsing based on shorter separator matches)."

I don't know what the correct wording is, but this is not it :-)
This is a very complex piece of logic to describe, but it is fairly central to the parsing algorithms. If we don't get it right then we will end up with divergent DFDL implementations. I honestly don't know where or how we should be describing the delimiter parsing logic - can we discuss on the next WG call?


SMH: This paragraph is solely describing how the matching works, not anything else. It is independent of lengthKind. This wording was agreed under errata 2.70 and is used for initiator and terminator as well. What specifically is the issue?

MB: It's really unfortunate that there's this ambiguity about length-ties. But those can come up due to the character class entities. I.e., I can write separator="%SP;|%SP; %WSP+|%WSP+;" and both those would match as a separator of a and b in  "a | b".

MB: However, I'm not sure the above
purple wording is really needed about length-ties. If a separator longest-matches, we're done. We don't really care if there are two separator patterns that are ambiguous and can match the same thing. If they both match the same 'longest' match, then the separator was found.
14.2         Description of 'separator' property

"If a child element uses an escape scheme, then the escape scheme also applies to any separator."
What does this mean? Can we remove it?

SMH: It means that when unparsing a child element then an occurrence of the separator in the value will be escaped.


14.1        Empty Sequences

Doesn't seem right to have this as the very first sub-section. Can we make it the last, and move the other sections up by one? Or at least swap it with 14.2?


SMH: I don't see it makes much difference where it goes. So on the grounds of spec renumbering I'd prefer if it stayed where it is.

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  
kimbert@uk.ibm.com
Tel. 01962-816742  
Internal tel. 37246742



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU




--
Mike Beckerle | OGF DFDL WG Co-Chair | Tresys Technologies
Tel:  781-330-0412


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU