Not sure that we had discussed this on
a WG call, so adding to today's agenda. There's a potential spec update
needed.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 14/02/2012 13:30 -----
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
Tim Kimber/UK/IBM@IBMGB
Date:
31/01/2012 13:58
Subject:
Re: Issue 140
and empty string - question on escape schemes as empty-string qualifiers
Mike - some replies below
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB,
Tim Kimber/UK/IBM@IBMGB
Date:
30/01/2012 20:13
Subject:
Issue 140 and
empty string - question on escape schemes as empty-string qualifiers
I think we forgot about escape schemes and how they are
used to quote around empty strings, and possibly nil indicators, or I'd
like clarification anyway.
E.g.,
<dfdl:defineEscapeScheme name="quotedStrings">
<dfdl:escapeScheme escapeBlockStart="'"
escapeBlockEnd="'" escapeKind="escapeBlock" />
</dfdl:defineEscapeScheme>
<element name="x" type="string" nillable="true"
dfdl:nilValue="nil" dfdl:escapeSchemeRef="quotedStrings"/>
Now, if data is [nil, nil] I get two nils. <x xsi:nil="true/><x
xsi:nil="true/>
What if data is ['nil','nil'] - either I still get two nils, or I get two
non-nil strings with "nil" as their contents: <x>nil</x><x>nil</x>
Which is it?
SMH: According to the property precedence order
in section 22 of the spec, the escape scheme is applied before nil value
processing when parsing, and after nil value processing on unparsing. That
is independent of the nilKind. So in your example you would get two nils
in the infoset.
Similarly, assume please that empty string matches the syntax for empty
per initiator/terminator and emptyValueDelimiterPolicy, Now if I have
<element name="myString" type="string" minOccurs="0",
maxOccurs="2" dfdl:escapeSchemeRef="quotedStrings">
It's all optional, so if the data is ['',''] then I either get nothing
in the infoset (because empty creates nothing for optionals), or I get
two empty strings in the infoset.
Which is it?
SMH: I would look at this from the unparsing
angle. If there is nothing in the infoset then I would expect to see nothing
in the data, I would not expect to see escaped nothing. That's true if
generateEscapeBlock is 'always' or 'whenNeeded'. If I had an empty string
in the infoset then I would expect it to be escaped in the data if I said
'always' but not if I said 'whenNeeded' (because %ES; is not allowed as
a delimiter or as a value extraEscapedCharacters, so escaping empty string
can never be needed.) From this, the only way I could get '' in the
data would be if I had escaped an empty string. Therefore on parsing, I
would treat '' as an escaped empty string and add empty string to infoset.
This sounds right to me. In our action 140 document, we have defined
'empty' to mean that the returned length (however obtained) is 0. If I
encounter escape characters than I would claim that slot in the data is
not 'empty'.
We should check that this is consistent with
how emptyValueDelimiterPolicy is applied. For parsing section 22 has this
correct, and emptyValueDelimiterPolicy is examined before escape scheme
applied. But for unparsing section 22 has it the wrong way round - the
property should be applied after any escaping/padding has taken place.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU