We have choices like this:

<choice>
  <sequence>
    <sequence dfdl:hiddenGroupRef="PI_true"/>

    <sequence>
       <element name="foo" minOccurs="0" ...../>
       <element name="bar" minOccurs="0" ..../>
  </sequence>

  <sequence dfdl:hiddenGroupRef="PI_false"/>

</choice>

PI_true and PI_false are presence indicator flags.

When unparsing, if you have element "foo" or "bar" in the infoset, then the first branch is selected, and the PI_true flag is unparsed.

The problem is when neither "foo" nor "bar" is in the infoset. Note that both are optional above.

In that case, we want the PI_false branch to be chosen.

So point 1: The DFDL spec (section 15.1.3) is unclear about how a choice branch is chosen if it contains no visible element.

Possible fix 1: One reasonable clarification might be that the first branch that admits no elements when unparsing would be chosen.

Possible fix 2: Another reasonable clarification would be that a branch with no possible elements is preferred to branches that have possible elements. If there is more than one such branch, the first would be chosen.

Maybe there are other possible fixes?

Daffodil currently is implementing possible fix 2, but that's not necessarily right. It maintains backward compatibility with older daffodil-oriented DFDL schemas.

If we decide Fix 1 is better, then we would have to put in a backward compatibility flag defaulting to true that enables users to continue to use schemas written as above, but we'd have to revise schemas like the above to reverse the order of the branches, and eventually flip the state of the flag to require use of these new reordered schemas.

Thoughts?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com

Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy