First thing to note is that 'anyEmpty'
means the sequence is non-positional, and in such a sequence I would expect
initiators to be defined.
EmptyValueDelimiterPolicy not relevant
as no initiator or terminator.
"Since
the 'y' element decl does not specify a XSD default value, the concept
of 'empty' and defaulting doesn't apply here".
Not correct. The concept of empty applies; defaulting happens if empty
& required & default set.
For your "foo;" example, the
infoset should not contain </y> because y is optional & empty
& does not have initiator (spec 9.4.2.2):
Optional occurrence: If dfdl:emptyValueDelimiterPolicy
is not 'none' then an item is added to the Infoset using empty string (type
xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise
nothing is added to the Infoset.
I think that the sentence can be clarified
to say:
Optional occurrence: If dfdl:emptyValueDelimiterPolicy
is applicable and not
'none' then an item is added to the Infoset using empty string (type xs:string)
or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing
is added to the Infoset.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org
Date:
01/08/2018 19:42
Subject:
Re: [DFDL-WG]
clarification needed - ambiguity about empty string
and optional element
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
I omitted that dfdl:emptyValueDelimiterPolicy is 'both'
here, though no dfdl:initiator nor dfdl:terminator are defined.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Wed, Jul 11, 2018 at 8:16 AM, Mike Beckerle <mbeckerle.dfdl@gmail.com>
wrote:
Consider this data of 4 characters:
foo;
Consider this schema where the default
format is the basic general set of text-oriented defaults.
<xs:element name="ex_infix"
dfdl:lengthKind="implicit">
<xs:complexType>
<xs:sequence dfdl:separator=";"
dfdl:separatorSuppressionPolicy="anyEmpty" dfdl:separatorPosition="infix">
<xs:element
name="x" type="xs:string" dfdl:lengthKind="delimited"/>
<xs:element
name="y" type="xs:string" minOccurs="0"
dfdl:lengthKind="delimited"
dfdl:occursCountKind="implicit"/>
</xs:sequence>
</xs:complexType>
</xs:element>
This is in a current Daffodil unit test,
and produces this infoset:
<ex_infix><x>foo</x><y/></ex_infix>
That is, an empty string element is created
for element 'y'.
I'd like to know what IBM DFDL produces
as the infoset for this example.
I believe the DFDL spec is actually self-contradictory
and so ambiguous here about what is the right behavior.
- DFDL Spec 14.2.1 description of anyEmpty:
"...any occurrences that have zero length representation MAY be omitted
from the data, along with their associated separator."
- Note that it says "may", not
"must be". So anyEmpty is "lax" in insisting that the
zero-length elements aren't present.
- This doesn't clarify anything for us. But
it admits the possibility that the ";" separator appears even
if the 'y' element occurrence is determined to not exist.
- DFDL Spec 9.3.1.1 says an element is known
to exist if it has the nil, empty, or normal representation
- In the example, element 'y' is zero-length
which is either empty or normal representation since a string can have
"" (empty string) as a value.
- Since the 'y' element decl does not specify
a XSD default value, the concept of 'empty' and defaulting doesn't apply
here, so a zero-length string is a normal representation, and according
to this section, it is known-to-exist.
- This contradicts 9.4.2.2 below.
- DFDL Spec 9.3.1.3 says "Note: based
on the above, when processing a sequence for which a separator is defined,
the presence of a match in the data for the separator is not sufficient
to cause the parser to determine that an associated component is known-to-exist."
It then refers you to 14.2.1
- I don't think this changes anything. Again
it just admits that the separator ";" can appear even without
the following element. I.e., I think it just allows for lax processing
of excess separators.
- DFDL Spec 9.4.2 Element Defaults When Parsing
- Subsection 9.4.2.2 Simple element
(xs:string or xs:hexBinary) (Emphasis below is mine)
- Here's the excerpted text:
- "Required occurrence: If the
element has a default value then an item is added to the infoset
using the default value, otherwise an item is added to the Infoset using
empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as
the value. Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not
'none'[12]
then an item is added to the Infoset using empty string (type xs:string)
or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing
is added to the Infoset.
Note:
To prevent unwanted empty strings or empty hexBinary values
from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert
that uses the dfdl:checkConstraints() function, to raise a processing error."
- Note that the language states "if
the element has a default value" - which denotes that the section
is dealing with both defaultable AND non-defaultable elements, and is not
exclusively discussing defaultable elements as the title of 9.4.2 would
imply.
- The second statement is about optional
occurrences, and it does not qualify what it says on defaultable element
or not. Hence, I read the "nothing is added to the infoset" as
applies whether or not the element is defaultable. So a zero length (ZL)
string is never going to create an empty-string value for an optional element.
- However, this contradicts the note about
preventing unwanted empty strings. That note is only sensible if optional
elements of zero-length will get added to the infoset and extra steps are
required to force a facet check to prevent them.
Unless I'm missing another place in the
DFDL spec that clarifies this, I think we need to revise this area to make
things clearer.
But first we have to pick which is the
intended semantics. In the example above, which infoset is the one we want:
<ex_infix><x>foo</x><y/></ex_infix>
(empty string as normal representation takes priority over optionality)
or
<ex_infix><x>foo</x></ex_infix>
(optionality takes priority over empty string as normal representation)
Either way I think this change is needed:
- Section 9.4.2 - change section title to "Element
Defaults and Optionality When Parsing"
But
a bunch of other clarifications are also needed.
Today Daffodil 2.1.0 implements the first
behavior. <ex_infix><x>foo</x><y/></ex_infix>
with the empty 'y' element.
What does IBM DFDL do?
Mike Beckerle | OGF DFDL Workgroup Co-Chair
| Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's
email discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU