Another alternative (d) would be to say the first child element in a choice alternative must be required, (minOccurs >= 1. I think UPA requires this already), and cannot have a default value or an outputValueCalc (don't forget about that one!). That is to say, must have a value in the infoset.  Then UPA rules combined with that would make everything unambiguous.

This restriction here is actually about points of uncertainty generally, not just choices. E.g., if I have an optional element of complex type sequence, then it's first child element and the next element following the optional element cannot be optional and cannot have a default value.

The easiest way to always satisfy this restriction, is just wrap anything at a point of uncertainty in another element tier. That always works and always fixes it. However, using the above described restriction, we can eliminate the need to do this wrapping, and still make it trivially decidable which alternative to take when serializing.

The result of the above rule (d) is that your model then has a SDE unless you change firstName to not be defaultable, but making that change fixes your model (into something much more rational in my opinion, as having an optional firstName is achieved by the choice. You don't also need it done by optionality), though you still have the missing phone number to deal with.

If you don't like my choice (d) above, then otherwise I'd say choice (a), i.e., first possible wins, is the right behavior. In this case, a suboptimal, but correct algorithm is to try serializing each choice branch one by one in turn, and stop when one succeeds.  That's what the semantics should be. It's much too hard to reason about anything else, and this is symmetric with parsing, which does not search for the alternative that best matches the data, it just takes the first successful.


...mikeb





On Fri, Apr 13, 2012 at 6:52 AM, Steve Hanson <smh@uk.ibm.com> wrote:
Hi Tim

I've made some minor corrections to your summary of the problem.

If the user restructures his model to wrap the sequences in elements then the problem goes away.  So I think we should keep the solution to this as simple as we can while not being unnecessarily restrictive.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        Tim Kimber/UK/IBM@IBMGB
To:        dfdl-wg@ogf.org
Date:        13/04/2012 10:54
Subject:        [DFDL-WG] How to choose the correct choice branch when serializing
Sent by:        dfdl-wg-bounces@ogf.org




There is an interesting edge case which arises when the serializer encounters a choice group.
A DFDL xsd is structured as follows:


<root>

   <choice>

       <sequence>

           <firstname/>

           <lastname/>

           <postcode/>

       </sequence>

       <sequence>

           <lastname/>

           <telephoneNumber/>

       </sequence>

   </choice>

</root>


Note that both branches of the choice are sequences, not elements.


The infoset is


<root>

   <lastName/>

   <telephoneNumber/>

</root>


The likely action of the serializer is:

- pick the first branch of the choice ( because it contains lastname )

- output the default value of firstname ( assuming that firstname has minOccurs = 1
and has a default )
- output lastname
- issue a processing error because telephoneNumber is found in the info set but is not in the first branch.

...but from the infoset the user clearly intended:
- select the second branch of the choice and successfully process the entire info set


The DFDL specification does not state what the behaviour should be. I think the options are:

a) state explicitly that the serializer will choose the first branch that contains a matching element, regardless of minOccurs

b) invent a new rule that causes the parser to back out of a branch and try another branch if there is a minOccurs error while processing the branch

c) disallow sequences and choices as immediate children of a choice group


Currently I'm leaning toward a) by process of elimination, for the following reasons:

b) would make this scenario work, but I think it would impose a lot of work on implementers because it would require the serializer to do backtracking.

c) would simplify a lot of things, but I think it's too restrictive - I can imagine complex data formats where is might be useful to have a choice as the direct child of a choice because the discrimination rules might be easier to express in a two-level structure.


regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert@uk.ibm.com
Tel. 01962-816742  
Internal tel. 246742





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 
https://www.ogf.org/mailman/listinfo/dfdl-wg







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU







--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg



--
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412