Not sure that we had discussed this on a WG call, so adding to today's agenda. There's a potential spec update needed.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 14/02/2012 13:30 -----

From: Steve Hanson/UK/IBM
To: Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc: Tim Kimber/UK/IBM@IBMGB
Date: 31/01/2012 13:58
Subject: Re: Issue 140 and empty string - question on escape schemes as empty-string qualifiers

Mike - some replies below

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve Hanson/UK/IBM@IBMGB, Tim Kimber/UK/IBM@IBMGB
Date: 30/01/2012 20:13
Subject: Issue 140 and empty string - question on escape schemes as empty-string qualifiers

I think we forgot about escape schemes and how they are used to quote around empty strings, and possibly nil indicators, or I'd like clarification anyway.

E.g.,

<dfdl:defineEscapeScheme name="quotedStrings">
<dfdl:escapeScheme escapeBlockStart="'"
escapeBlockEnd="'" escapeKind="escapeBlock" />
</dfdl:defineEscapeScheme>

<element name="x" type="string" nillable="true" dfdl:nilValue="nil" dfdl:escapeSchemeRef="quotedStrings"/>

Now, if data is [nil, nil] I get two nils. <x xsi:nil="true/><x xsi:nil="true/>

What if data is ['nil','nil'] - either I still get two nils, or I get two non-nil strings with "nil" as their contents: <x>nil</x><x>nil</x>

Which is it?

SMH: According to the property precedence order in section 22 of the spec, the escape scheme is applied before nil value processing when parsing, and after nil value processing on unparsing. That is independent of the nilKind. So in your example you would get two nils in the infoset.

Similarly, assume please that empty string matches the syntax for empty per initiator/terminator and emptyValueDelimiterPolicy, Now if I have

<element name="myString" type="string" minOccurs="0", maxOccurs="2" dfdl:escapeSchemeRef="quotedStrings">

It's all optional, so if the data is ['',''] then I either get nothing in the infoset (because empty creates nothing for optionals), or I get two empty strings in the infoset.

Which is it?

SMH: I would look at this from the unparsing angle. If there is nothing in the infoset then I would expect to see nothing in the data, I would not expect to see escaped nothing. That's true if generateEscapeBlock is 'always' or 'whenNeeded'. If I had an empty string in the infoset then I would expect it to be escaped in the data if I said 'always' but not if I said 'whenNeeded' (because %ES; is not allowed as a delimiter or as a value extraEscapedCharacters, so escaping empty string can never be needed.) From this, the only way I could get '' in the data would be if I had escaped an empty string. Therefore on parsing, I would treat '' as an escaped empty string and add empty string to infoset. This sounds right to me. In our action 140 document, we have defined 'empty' to mean that the returned length (however obtained) is 0. If I encounter escape characters than I would claim that slot in the data is not 'empty'.

We should check that this is consistent with how emptyValueDelimiterPolicy is applied. For parsing section 22 has this correct, and emptyValueDelimiterPolicy is examined before escape scheme applied. But for unparsing section 22 has it the wrong way round - the property should be applied after any escaping/padding has taken place.

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU