The spec language around how choice branches are resolved when unparsing suggests that each branch must have an element in it somewhere.


On unparsing there is the question of how one identifies the appropriate schema choice branch
corresponding to the data in the infoset. This is complicated by the fact that the children may not
be elements. They may themselves be sequences or choices.The selection of the choice branch
is as follows: The element in the infoset is used to search the choice branches in the schema, in
schema definition order, but without looking inside any complex elements. If the element occurs
in a branch, then that branch is selected and if subsequently a processing error occurs, this
selection is not revisited (that is, there is no backtracking).
To avoid any unintended behavior, all the children of a choice can be modeled as elements.

However,  that passage of the spec seems incomplete now to me.  Though at the time it was written and reviewed it seemed to be the solution to the issue. It does imply that each model group in a choice branch has to have an element somewhere, but I couldn't find that statement explicitly in the spec, and the spec does say empty model groups are specifically allowed.

14.1     Empty Sequences

A sequence having no children is syntactically legal in DFDL. In the data stream, such a sequence can have non-zero length LeftFraming and RightFraming regions, but the SequenceContent region in between must be empty. It is a processing error if the SequenceContent region of an empty sequence has non-zero length when parsing.


This leaves open the issue of hidden groups e.g., <sequence dfdl:hiddenGroupRef="mygroup"/> is an empty sequence? Or does the hidden group count as if there really are children of this sequence? I suspect the latter, but need to see how others have interpreted this.


XML schema does not define an empty sequence that is the content model of a complex type definition as effective content so any DFDL annotations on such a construct would be ignored. It is a schema definition error if the empty sequence is the content model of a complex type, or if a complex type has nothing in its content model at all.


That makes clear that both these are SDE:


<complexType><sequence/></complexType>


<complexType></complexType>


But it leaves many scenarios unclear still.


Consider this schema fragment:

....
<choice>
  <sequence dfdl:terminator=";"/>
  <element name="foo"/>
</choice>
<element name="bar"/>
....

If we are unparsing and the infoset is just <bar> then we can compute that finding <bar> in the infoset means the first branch is selected. So we would unparse that and output a ";".

However, if there is true ambiguity like:

<choice>
   <sequence dfdl:terminator=";"/>
   <sequence dfdl:terminator="/"/>
  <element name="foo"/>
</choice>
<element name="bar"/>

That's effectively saying on parse either a ";" or a "/" may be found. On unparse, there is nothing to guide which choice branch other than telling us if <bar> is found, that the <foo> element branch is NOT selected. However, it is ok to just output the first always (;) if the infoset has a <bar> element.

However, I wanted to check and see what others interpretation of the spec is for this issue.



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy