I omitted that dfdl:emptyValueDelimiterPolicy is 'both' here, though no dfdl:initiator nor dfdl:terminator are defined.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy


On Wed, Jul 11, 2018 at 8:16 AM, Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote:

Consider this data of 4 characters:


foo;


Consider this schema where the default format is the basic general set of text-oriented defaults.


<xs:element name="ex_infix" dfdl:lengthKind="implicit">

  <xs:complexType>

    <xs:sequence dfdl:separator=";" dfdl:separatorSuppressionPolicy="anyEmpty" dfdl:separatorPosition="infix">

       <xs:element name="x" type="xs:string" dfdl:lengthKind="delimited"/>

       <xs:element name="y" type="xs:string" minOccurs="0"

          dfdl:lengthKind="delimited"

          dfdl:occursCountKind="implicit"/>

   </xs:sequence>

 </xs:complexType>

</xs:element>

          

This is in a current Daffodil unit test, and produces this infoset:


<ex_infix><x>foo</x><y/></ex_infix>


That is, an empty string element is created for element 'y'.


I'd like to know what IBM DFDL produces as the infoset for this example.


I believe the DFDL spec is actually self-contradictory and so ambiguous here about what is the right behavior.


  • DFDL Spec 14.2.1 description of anyEmpty: "...any occurrences that have zero length representation MAY be omitted from the data, along with their associated separator."
    • Note that it says "may", not "must be". So anyEmpty is "lax" in insisting that the zero-length elements aren't present.
    • This doesn't clarify anything for us. But it admits the possibility that the ";" separator appears even if the 'y' element occurrence is determined to not exist. 

  • DFDL Spec 9.3.1.1 says an element is known to exist if it has the nil, empty, or normal representation
    • In the example, element 'y' is zero-length which is either empty or normal representation since a string can have "" (empty string) as a value.
    • Since the 'y' element decl does not specify a XSD default value, the concept of 'empty' and defaulting doesn't apply here, so a zero-length string is a normal representation, and according to this section, it is known-to-exist.  
    • This contradicts 9.4.2.2 below.

  • DFDL Spec 9.3.1.3 says "Note: based on the above, when processing a sequence for which a separator is defined, the presence of a match in the data for the separator is not sufficient to cause the parser to determine that an associated component is known-to-exist." It then refers you to 14.2.1
    • I don't think this changes anything. Again it just admits that the separator ";" can appear even without the following element. I.e., I think it just allows for lax processing of excess separators.

  • DFDL Spec 9.4.2 Element Defaults When Parsing - Subsection 9.4.2.2      Simple element (xs:string or xs:hexBinary)  (Emphasis below is mine)
    • Here's the excerpted text:
      • "Required occurrence: If the element has a default value then an item is added to the infoset using the default value, otherwise an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value. Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none'[12] then an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the Infoset.

      Note: To prevent unwanted empty strings or empty hexBinary values from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert that uses the dfdl:checkConstraints() function, to raise a processing error."

    • Note that the language states "if the element has a default value" - which denotes that the section is dealing with both defaultable AND non-defaultable elements, and is not exclusively discussing defaultable elements as the title of 9.4.2 would imply.
    • The second statement is about optional occurrences, and it does not qualify what it says on defaultable element or not. Hence, I read the "nothing is added to the infoset" as applies whether or not the element is defaultable. So a zero length (ZL) string is never going to create an empty-string value for an optional element.
    • However, this contradicts the note about preventing unwanted empty strings. That note is only sensible if optional elements of zero-length will get added to the infoset and extra steps are required to force a facet check to prevent them.


Unless I'm missing another place in the DFDL spec that clarifies this, I think we need to revise this area to make things clearer.


But first we have to pick which is the intended semantics. In the example above, which infoset is the one we want:


    <ex_infix><x>foo</x><y/></ex_infix> (empty string as normal representation takes priority over optionality)

or

    <ex_infix><x>foo</x></ex_infix> (optionality takes priority over empty string as normal representation)


Either way I think this change is needed:

  • Section 9.4.2 - change section title to "Element Defaults and Optionality When Parsing"
But a bunch of other clarifications are also needed.

Today Daffodil 2.1.0 implements the first behavior. <ex_infix><x>foo</x><y/></ex_infix> with the empty 'y' element.

What does IBM DFDL do?









Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy