A couple of comments below.
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Steve Hanson/UK/IBM@IBMGB
To:
dfdl-wg@ogf.org,
Date:
05/02/2014 10:50
Subject:
[DFDL-WG] Action
248 (was Thoughts on a discriminator scenario)
Sent by:
dfdl-wg-bounces@ogf.org
248
| Discriminators
and potential points of uncertainty (Steve)
28/1: Steve to write up a proposal to prevent a discriminator from behaving
in a non-obvious manner when used with a potential point of uncertainty
that turns out not to be an actual point of uncertainty.
5/2: With Steve |
I started on this by reading section
9.3.3 on points of uncertainty, which lists the potential PoUs. Here's
the list to save getting the spec out.
1. An xs:choice
branch
2. All xs:elements
in an unordered xs:sequence (dfdl:sequenceKind is 'unordered')
3. An optional
xs:element
4. An array xs:element
5. All xs:elements
in an xs:sequence containing one or more floating xs:elements.
The section then looks at each in turn
and gives the circumstances when it is an actual PoU or not. As currently
written, it is only 3 and 4 where a potential PoU might not be an actual
PoU. For 1, 2 and 5 it says they are always actual PoUs.
But I'm not sure that's correct. A deeper
analysis of what is actually going on with 1, 2 and 5 says to me that there
are times when there might not be an actual PoU.
1. Given that there is no concept in
DFDL of optional choice branches, then if the last branch is reached then
there is no longer a PoU. It must be that branch else it is a processing
error.
I think of it slightly differently. It is a PoU, even if
the branch is the only remaining branch. If we say that the final choice
branch is not a PoU then diagnostics become confused - the parser reports
the error code as 'error while parsing root/choice/lastBranch/field1' when
the correct error code would be 'none of the branches of root/choice were
found in the data'.
2. There can come a point in an unordered
sequence when all that can be encountered is one element, and if that is
(1,1) then there is no longer a PoU.
It's still a PoU. The specification says that occursCountKind
is 'parsed' for all members of an unordered group, so min/maxOccurs do
not come into play.
5. If all floating elements are (1,1)
and all are encountered, then from that point on there are no longer any
PoUs due to floating elements.
I suspect that floating elements are somewhat like unordered branches -
most users will not want min/maxOccurs to affect the parsing of the group.
Schema validation ( or more complex validation applied in the receiving
application ) will deal with non-conformances.
I'd like us to get straight on this before
I proceed with the action proper.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 05/02/2014 10:12 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 27/01/2014
17:39
Subject: Fw:
Thoughts on a discriminator scenario
Been thinking some more on the discriminator scenario below that I mailed
out before xmas, and discussing it with the IBM DFDL team.
The 'confusing' aspect of the behaviour is that a discriminator within
a potential PoU will act on a higher level PoU if the potential PoU is
not an actual PoU. In the example, the array element 'Type1' is not an
actual PoU for occurrence 1, only for occurrences 2+. So when the discriminator
fires for occurrence 1 it will resolve a higher level unresolved PoU if
one exists.
Perhaps the spec should say that a discriminator can't 'leak' beyond the
potential PoU that encloses it ? If so, then for occurrence 1 the discriminator
has no effect, and only has an effect for occurrences 2+. This makes
for more predictable and robust schemas.
We'd need to go through spec section 9.3.3 carefully to see if this does
not break any of the potential PoUs that are listed.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 16/01/2014 09:55 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 20/12/2013
13:20
Subject: Thoughts
on a discriminator scenario
Take the following schema (simplified) for element Type1 (1,10) being a
loop for elements A,B,C. Type 1 does not have an initiator so I need
to use a discriminator to establish the existence of an occurrence of Type1
so that incorrect backtracking does not occur after an error. Because occursCountKind
is 'implicit', the 1st occurrence is not a point of uncertainty
so the discriminator acts instead on any enclosing point of uncertainty,
but for 2nd and subsequent occurrences it acts on Type1. That is
all working as designed, but I think users will the 1st occurrence behaviour
a bit confusing. There are workarounds to avoid the problem, eg, use occursCountKind
'parsed' or split Type1 into two as (1,1) and (0,9). I think this is worth
documenting in a tutorial as this is quite subtle stuff.
<xs:element name="Type1" maxOccurs="10"
dfdl:occursCountKind="implicit">
<dfdl:discriminator test="{fn:exists(A)}" />
<xs:complexType>
<xs:sequence>
<xs:element name="A"
dfdl:initiator="A:" ... />
<xs:element name="B" dfdl:initiator="B:"
... />
<xs:element name="C" dfdl:initiator="C:"...
/>
</xs:sequence>
</xs:complexType>
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU