248
| Discriminators
and potential points of uncertainty (Steve) 28/1: Steve to write up a proposal to prevent a discriminator from behaving in a non-obvious manner when used with a potential point of uncertainty that turns out not to be an actual point of uncertainty. 5/2: With Steve |
I started on this by reading section 9.3.3 on points of uncertainty, which lists the potential PoUs. Here's the list to save getting the spec out.
1. An xs:choice branch
2. All xs:elements in an unordered xs:sequence (dfdl:sequenceKind is 'unordered')
3. An optional xs:element
4. An array xs:element
5. All xs:elements in an xs:sequence containing one or more floating xs:elements.
The section then looks at each in turn and gives the circumstances when it is an actual PoU or not. As currently written, it is only 3 and 4 where a potential PoU might not be an actual PoU. For 1, 2 and 5 it says they are always actual PoUs.
But I'm not sure that's correct. A deeper analysis of what is actually going on with 1, 2 and 5 says to me that there are times when there might not be an actual PoU.
1. Given that there is no concept in DFDL of optional choice branches, then if the last branch is reached then there is no longer a PoU. It must be that branch else it is a processing error.
TK: I think of it slightly differently. It is a PoU, even if the branch is the only remaining branch. If we say that the final choice branch is not a PoU then diagnostics become confused - the parser reports the error code as 'error while parsing root/choice/lastBranch/field1' when the correct error code would be 'none of the branches of root/choice were found in the data'.
SMH: I see your point. My thinking was that choices have finite branches and a choice is (1,1). If I have got to the last branch then I am not one of the other branches so I must be this one. If there is any other possibility then the model is missing a branch, even if it is just one that contains an empty sequence with an assert {fn:false()}. In practice of course users forget to add that last branch (there's no XSDL equivalent to the 'default' branch of a switch/case statement), so yes they could end up with an unclear diagnostic.
2. There can come a point in an unordered sequence when all that can be encountered is one element, and if that is (1,1) then there is no longer a PoU.
TK: It's still a PoU. The specification says that occursCountKind is 'parsed' for all members of an unordered group, so min/maxOccurs do not come into play.
SMH: Interesting. The spec says that if a member is optional or an array then it must be 'parsed'. If it is (1,1) though it does not have an occursCountKind. The specific case I was thinking of is when all members are (1,1), so when you have one element to go there is no PoU. However, the rewrite into a repeating choice has the effect of making everything 'parsed', which is really the point you are making. So I agree with you, it is easier to say that everything is an actual PoU else it complicates the rewrite semantic.
5. If all floating elements are (1,1) and all are encountered, then from that point on there are no longer any PoUs due to floating elements.
TK: I suspect that floating elements are somewhat like unordered branches - most users will not want min/maxOccurs to affect the parsing of the group. Schema validation ( or more complex validation applied in the receiving application ) will deal with non-conformances.
SMH: Possibly yes. With something like X12 NTE segments, that is the case. But we don't express the floating semantic as a rewrite of the whole sequence like we do for unordered, it's more of a per element thing. And if that is done dynamically as we go through the sequence, having no PoU can result.
I'd like us to get straight on this before I proceed with the action proper.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 05/02/2014 10:12 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 27/01/2014
17:39
Subject: Fw:
Thoughts on a discriminator scenario