Hi Tim

I've made some minor corrections to your summary of the problem.

If the user restructures his model to wrap the sequences in elements then the problem goes away.  So I think we should keep the solution to this as simple as we can while not being unnecessarily restrictive.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        Tim Kimber/UK/IBM@IBMGB
To:        dfdl-wg@ogf.org
Date:        13/04/2012 10:54
Subject:        [DFDL-WG] How to choose the correct choice branch when serializing
Sent by:        dfdl-wg-bounces@ogf.org




There is an interesting edge case which arises when the serializer encounters a choice group.
A DFDL xsd is structured as follows:


<root>

   <choice>

       <sequence>

           <firstname/>

           <lastname/>

           <postcode/>

       </sequence>

       <sequence>

           <lastname/>

           <telephoneNumber/>

       </sequence>

   </choice>

</root>


Note that both branches of the choice are sequences, not elements.


The infoset is


<root>

   <lastName/>

   <telephoneNumber/>

</root>


The likely action of the serializer is:

- pick the first branch of the choice ( because it contains lastname )

- output the default value of firstname ( assuming that firstname has minOccurs = 1
and has a default )
- output lastname
- issue a processing error because telephoneNumber is found in the info set but is not in the first branch.

...but from the infoset the user clearly intended:
- select the second branch of the choice and successfully process the entire info set


The DFDL specification does not state what the behaviour should be. I think the options are:

a) state explicitly that the serializer will choose the first branch that contains a matching element, regardless of minOccurs

b) invent a new rule that causes the parser to back out of a branch and try another branch if there is a minOccurs error while processing the branch

c) disallow sequences and choices as immediate children of a choice group


Currently I'm leaning toward a) by process of elimination, for the following reasons:

b) would make this scenario work, but I think it would impose a lot of work on implementers because it would require the serializer to do backtracking.

c) would simplify a lot of things, but I think it's too restrictive - I can imagine complex data formats where is might be useful to have a choice as the direct child of a choice because the discrimination rules might be easier to express in a two-level structure.


regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert@uk.ibm.com
Tel. 01962-816742  
Internal tel. 246742





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 
https://www.ogf.org/mailman/listinfo/dfdl-wg






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU