On Fri, Apr 13, 2012 at 6:52 AM, Steve Hanson <smh@uk.ibm.com> wrote:

Hi Tim

I've made some minor corrections to your summary of the problem.

If the user restructures his model to wrap the sequences in elements then the problem goes away. So I think we should keep the solution to this as simple as we can while not being unnecessarily restrictive.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From: Tim Kimber/UK/IBM@IBMGB
To: dfdl-wg@ogf.org
Date: 13/04/2012 10:54
Subject: [DFDL-WG] How to choose the correct choice branch when serializing
Sent by: dfdl-wg-bounces@ogf.org

There is an interesting edge case which arises when the serializer encounters a choice group.
A DFDL xsd is structured as follows:

<root>
<choice>
<sequence>
<firstname/>
<lastname/>
<postcode/>
</sequence>
<sequence>
<lastname/>
<telephoneNumber/>
</sequence>
</choice>
</root>

Note that both branches of the choice are sequences, not elements.

The infoset is

<root>
<lastName/>
<telephoneNumber/>
</root>

The likely action of the serializer is:
- pick the first branch of the choice ( because it contains lastname )

- output the default value of firstname ( assuming that firstname has minOccurs = 1 and has a default )
- output lastname
- issue a processing error because telephoneNumber is found in the info set but is not in the first branch.

...but from the infoset the user clearly intended:
- select the second branch of the choice and successfully process the entire info set

The DFDL specification does not state what the behaviour should be. I think the options are:
a) state explicitly that the serializer will choose the first branch that contains a matching element, regardless of minOccurs
b) invent a new rule that causes the parser to back out of a branch and try another branch if there is a minOccurs error while processing the branch
c) disallow sequences and choices as immediate children of a choice group

Currently I'm leaning toward a) by process of elimination, for the following reasons:
b) would make this scenario work, but I think it would impose a lot of work on implementers because it would require the serializer to do backtracking.
c) would simplify a lot of things, but I think it's too restrictive - I can imagine complex data formats where is might be useful to have a choice as the direct child of a choice because the discrimination rules might be easier to express in a two-level structure.

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg