Action 248 closed with no change in behaviour.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
26/11/2014 17:43
Subject:
Re: [DFDL-WG]
Fw: Fw: Action 248 (was Thoughts on a discriminator scenario)
The EDIFACT schemas on GitHub and elsewhere
use a couple of discriminators that exploit the current behaviour.
In EDIFACT, an Interchange is a UNA,
a UNB, either one or more Functional Groups or one or more Messages, and
a UNZ.
A Functional Group is a UNG, one or
more Messages, and a UNE.
A Message is a UNH, a bunch of other
segments, and a UNT.
Here's an edited copy to illustrate.
The elements in blue are the 1..unbounded elements. The elements
in green (UNG, UNH) have a complex type that contains a discriminator fn:true()
once the initiator for the element has been found.
Example parse: Let's say my Interchange
has two functional groups. The parser enters the choice in red. It tries
to parse the FunctionGroup branch. It finds a UNG and its discriminator
is true. That resolves the choice branch (because FunctionGroup minOccurs
is '1') and so stops the parser from trying the other branch if a failure
occurs. The next time round the loop the UNG discriminator is again true.
That resolves the optional occurrence of the FunctionGroup. Same
deal for the Message branch of the choice with its UNH.
(Note that when parsing Message within
FunctionGroup, the first time round the Message loop the UNH discriminator
has no effect as there is no PoU in scope. Other times round it resolves
the optional occurrences of Message).
<xsd:element name="Interchange">
<xsd:complexType>
<xsd:sequence>
<xsd:element
dfdl:initiator="UNA" dfdl:length="6" dfdl:terminator="%WSP*;"
minOccurs="0" name="UNA" type="srv:UNA"/>
<xsd:element
dfdl:initiator="UNB" dfdl:ref="ibmEdiFmt:EDISegmentFormat"
name="UNB" type="srv:UNB-InterchangeHeader"/>
<!--
Content is either Functional Groups or independent Messages, never a mixture
-->
<xsd:choice>
<xsd:element maxOccurs="unbounded" name="FunctionGroup"
dfdl:occursCountKind="implicit">
<xsd:complexType>
<xsd:sequence>
<xsd:element dfdl:initiator="UNG"
dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="UNG" type="srv:UNG-GroupHeader"/>
<xsd:element maxOccurs="unbounded"
ref="D03B:Message" dfdl:occursCountKind="implicit"/>
<xsd:element dfdl:initiator="UNE" dfdl:ref="ibmEdiFmt:EDISegmentFormat"
name="UNE" type="srv:UNE-GroupTrailer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element maxOccurs="unbounded" ref="D03B:Message"
dfdl:occursCountKind="implicit"/>
</xsd:choice>
<xsd:element
dfdl:initiator="UNZ" dfdl:ref="ibmEdiFmt:EDISegmentFormat"
name="UNZ" type="srv:UNZ-InterchangeTrailer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="Message">
<xsd:complexType>
<xsd:sequence>
<xsd:element dfdl:initiator="UNH" dfdl:ref="ibmEdiFmt:EDISegmentFormat"
name="UNH" type="srv:UNH-MessageHeader"/>
<xsd:choice>
....
</xsd:choice>
<xsd:element
dfdl:initiator="UNT" dfdl:ref="ibmEdiFmt:EDISegmentFormat"
name="UNT" type="srv:UNT-MessageTrailer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
25/11/2014 17:37
Subject:
Re: [DFDL-WG]
Fw: Fw: Action 248 (was Thoughts on a discriminator scenario)
As mentioned on the call, this is one of the ways of dealing
with the situation when an array has 'implicit' OCK, but has minOccurs
> 1, but also needs a discriminator for the optional elements.
I suggested on the call that this baggage be in a hidden
group, but as there are no elements involved, I think a hidden group is
not advisable here.
<xs:element name="a" dfdl:occursCountKind="implicit"
minOccurs="1"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<!--
This choice is DFDL's way of expressing this logic: -->
<!--
IF the occursIndex is for the optional part of the array -->
<!--
THEN evaluate the array-element discriminator -->
<!-- ELSE don't evaluate
discriminator. -->
<xs:choice>
<xs:sequence>
<xs:annotation><xs:appinfo ...>
<!-- IF occursIndex gt 1.... -->
<dfdl:discriminator>{ dfdl:occursIndex() gt 1 }</dfdl:discriminator>
<!-- THEN discriminate the optional array elements -->
<dfdl:discriminator>{ ....optional array
element discriminator... }</dfdl:discriminator>
</xs:appinfo></xs:annotation>
</xs:sequence>
<xs:sequence>
<!-- ELSE this is the occursIndex eq 1 case, we have no discriminator
-->
<!-- for the array element, since it is required. -->
</xs:sequence>
</xs:choice>
.... array
content goes here...
</xs:sequence>
</xs:complexType>
</xs:element>
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Tue, Nov 25, 2014 at 10:50 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
I think some of your wording changes
have changed my intent, which was that all arrays are potential PoUs.
The table now says that fixed, expression and stopValue are not potential
PoUs, which implies that the discriminator never acts on the array but
always on a higher PoU. I was trying to avoid this, because it means
that changing OCK can change the behaviour of the schema. But I guess
it's no different to changing the array to a scalar, which would have the
same effect.
Regarding the failure of the discriminator. The intent was it should behave
just like any assert failure or processing error. But I think your point
is then right - it means that the phrase 'a discriminator only ever resolves
that point of uncertainty' should actually be ''a discriminator only ever
positively resolves that point of uncertainty' - which is an asymmetric
behaviour. Are we comfortable with that?
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve
Hanson/UK/IBM@IBMGB
Cc: DFDL-WG
<dfdl-wg@ogf.org>
Date: 25/11/2014
15:35
Subject: Re:
[DFDL-WG] Fw: Fw: Action 248 (was Thoughts on a discriminator scenario)
My suggested additional wording in Red
below. There is an issue with this where it was unclear to me whether we've
defined exactly what happens.
If you have say, an array with occursCountKind 'implicit', minOccurs '1',
and the discriminator on the element evaluates to false for that required
first element, what happens? Do we fail the whole array? This sounds contradictory
to the notion that the discriminator "only resolves that element".
But having the discriminator be ignored doesn't seem right either.
...mike
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Mon, Nov 17, 2014 at 8:07 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
This action was raised because of concern with the behaviour of the discriminator
in the following example. Because OCK is 'implicit' the 1st occurrence
is not an actual PoU but the other 9 occurrences are. This means that for
1st occurrence, the discriminator actually acts on a higher PoU if one
exists.
<xs:element name="Type1" maxOccurs="10"
dfdl:occursCountKind="implicit">
<dfdl:discriminator
test="{fn:exists(A)}" />
<xs:complexType>
<xs:sequence>
<xs:element name="A" dfdl:initiator="A:"
... />
<xs:element name="B" dfdl:initiator="B:"
... />
<xs:element name="C" dfdl:initiator="C:"...
/>
</xs:sequence>
</xs:complexType>
This led to the suggestion that a discriminator should not 'leak' beyond
a potential PoU, regardless of whether it is an actual PoU. The argument
for this is contained in the thread below, and on re-reading I still think
it is the best solution to this, so that is what I propose.
There were also issues about the wording in section 9.3.3.
Sections 9.3.3 and 7.4 are reproduced below, and updated to address the
wording and leaking issues.
-------------------------------------------------
9.3.3 Points of Uncertainty
A point of uncertainty occurs
when parsing a schema component when an occurrence of that schema component
might not be the next item encountered in the data stream. Points
of uncertainty can be nested.
Any one of the following schema
constructs is a potential
point of uncertainty:
·
A
branch of xs:choice
· All
xs:elements in an unordered xs:sequence (dfdl:sequenceKind is 'unordered')
· An
optional xs:element
· An
array xs:element.
· All
xs:elements in an xs:sequence containing one or more floating xs:elements.
The parser resolves these points of uncertainty
by way of a set of construct-specific rules given below along with determining
whether schema components are known-to-exist or known-not-to-exist. For
some of these constructs, there are situations where while there is the
potential for uncertainty, the circumstances are such that there isn't
any actual uncertainty; hence, potential points of uncertainty are
distinguished from actual points of uncertainty below.
A branch of xs:choice is always an
actual point of uncertainty. A choice
is resolved sequentially, or by direct
dispatch. Sequential choice resolution occurs by parsing each choice branch
in schema definition order until one is known-to-exist. It is a processing
error if none of the choice branches are known-to-exist. Direct-dispatch
choice resolution occurs by matching the value of the dfdl:choiceDispatchKey
property to the value of the dfdl:choiceChoiceBranchKey property of one
of the choice branches. It is a processing error if none of the choice
branches have a matching value in their dfdl:choiceChoiceBranchKey property.
An element in an unordered xs:sequence is
always an actual
point of uncertainty. It is resolved by parsing for the child components
of the sequence in schema definition order at each point in the data stream
where a component can exist until the required number of occurrences of
each child component is known- to-exist or the sequence is terminated by
delimiters or specified length.
An element in a sequence with one or more
floating elements is always an
actual point of uncertainty. It is resolved
by parsing for the expected element at that point in the data stream. If
the expected element is known-not-to-exist then an occurrence of each floating
element is parsed in schema definition order.
When parsing an array, points of uncertainty
only occur for certain values of occursCountKind, as follows:
occursCountKind
| Details
of Potential and Actual Points of Uncertainty
|
fixed
| No
potential
point of uncertainty (maxOccurs occurrences expected).
|
implicit
| All
ocurrences are potential points of uncertainty. An actual point
of uncertainty exists after minOccurs occurrences found and until maxOccurs
occurrences
have been found.
|
parsed
| All
occurrences are actual points of uncertainty.
|
expression
| No
potential
point of uncertainty (dfdl:occursCount
occurrences expected)
|
stopValue
| No
potential point
of uncertainty (the stopValue must always be present, even
when minOccurs is 0). |
Table 11: Points of Uncertainty and dfdl:occursCountKind
An optional element point of uncertainty is
resolved by parsing the element until it is either known-to-exist or known-not-to-exist.
Whether an optional element is an actual point of uncertainty depends on
property dfdl:occursCountKind as described above. (Property dfdl:occursCountKind
is defined in Section 16.1 dfdl:occursCountKind property.)
For an array element, the point of uncertainty
is resolved for each occurrence separately by parsing the occurrence until
it is either known-to-exist or known-not-to-exist.
Discriminators resolve potential
points of uncertainty. A discriminator
defined on, or contained by, a schema construct that is a potential point
of uncertainty, will only ever resolve that point of uncertainty. This
holds regardless of whether there is any actual uncertainty.
For example, if a discriminator is defined on an array element which is
contained within the branch of a choice, the discriminator will only resolve
the existence of occurrences of the array element, and never the existence
of the occurrence of the choice branch. As
another example, consider an array element with dfdl:occursCountKind 'implicit'
and minOccurs '1'. The first element of such an array must exist, so there
is no actual uncertainty. A discriminator on such an element is redundant,
but often must be expressed so as to discriminate the existence of the
second and any subsequent array elements.
If a discriminator evaluates to 'false' or causes a processing error on
a potential point of uncertainty where there is no actual uncertainty,
..... TBD
(I think this causes a processing error which will fail the whole array.....but
that sounds like it contradicts the statement above that says "it
only ever resolves that point of uncertainty" ?)
------------------------------------
7.4 The dfdl:discriminator
Statement Annotation Element
DFDL discriminators are used during parsing
to resolve points of uncertainty that cannot be resolved by speculative
parsing. Discriminators are not used during unparsing. They can also
be used to force a resolution earlier during the parsing of a group so
that subsequent parsing errors are treated as processing errors of a known
component rather than a failure to find a component.
A discriminator determines the existence or
non-existence of a component. If the discriminator is successful then the
component is known to exist and any subsequent errors will not cause backtracking
at points of uncertainty. If a discriminator is unsuccessful then the component
is known not to exist and backtracking occurs immediately.
If the complex type of an element contains
a sequence group as its content model then if the sequence group is known
not to exist, then the element is known not to exist.
Examples of dfdl:discriminator annotation
are below :
<dfdl:discriminator>
{ ../recType eq 0 }
</dfdl:discriminator>
<dfdl:discriminator test="{ ../recType eq 0}" />
When the discriminator's expression evaluates
to "false", then it causes a processing error, and the discriminator
is said to fail.
A discriminator defined on, or
contained by, a schema construct that is a potential point of uncertainty,
will only ever resolve that point of uncertainty.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 17/11/2014 11:54 -----
From: Steve
Hanson/UK/IBM
To: Tim
Kimber/UK/IBM@IBMGB
Cc: dfdl-wg@ogf.org,
dfdl-wg-bounces@ogf.org
Date: 15/05/2014
10:48
Subject: Re:
[DFDL-WG] Fw: Action 248 (was Thoughts on a discriminator
scenario)
Tim - I've responded to your specific comments below
in blue font.
All - You will see that I have some concerns over the words used in the
definition of a PoU, as we seem to be unclear as to whether a PoU is a
point in the data stream or a point in the model. I am wondering whether
the concepts of 'potential PoU' and 'actual PoU' can be better expressed
as 'PoU in the model' and 'PoU in the data'. I want to mull this over for
a while. I'm not changing the rules by this, just how we express
them.
So please let me run with this before replying.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Tim
Kimber/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 14/05/2014
23:31
Subject: Re:
[DFDL-WG] Fw: Action 248 (was Thoughts on a discriminator
scenario)
Sent by: dfdl-wg-bounces@ogf.org
I agree that the wording is not easy to get right. However, I think the
current wording needs some adjustment so I'm going to make some suggestions
and see where it leads.
"A point of uncertainty occurs in the data stream when there is
more than one schema component
that might occur at that point."
I don't think this is precise enough.
SMH: Agree. I need to think about this sentence. There are several things
potentially wrong. It is defining a PoU as occurring in the data stream,
whereas elsewhere PoU is equated to a position in the model. It says 'more
than one schema component that might occur' - maybe it should say 'a schema
component may or may not occur'. And schema components don't occur
in the data stream anyway - occurrences of them do.
- if an optional element occurs at the end of the input data then there
is only *one* schema component that might occur at that point. The end
of the data stream might occur instead.
SMH: Yes but I I raised some similar arguments earlier in the thread, about
the last branch of a choice not being a PoU, or the last element in an
unordered sequence when all the others had been found not being a PoU.
We agreed that these are still all treated as PoUs for clarity. This is
another example.
- if an optional element occurs before the last required element in a sequence
AND the separatorSuppressionPolicy is not 'anyEmpty' then there is exactly
one schema component that can occur at that point in the data stream. But
it might be 'empty', in which case it will not be put into the info set.
This is not pedantry. The parser will never need to backtrack in either
of these cases and in the second case it is obvious in advance which schema
component the parser should select for parsing.
SMH: We have agreed in the past that the presence of a separator is not
enough to infer 'known-to-exist', so separators should not be brought into
this definition. You are right that in a positional sequence the parser
is looking for an occurrence of a component or its empty rep, and never
an occurrence of the next schema component, so the parser can certainly
optimise here. Let's take any discussion of separators out of this for
the moment, and raise a separate action if needed.
Points of uncertainty can be nested.
Any one of the following constructs is a potential point of uncertainty:
1. An xs:choice
2. All xs:elements in an unordered xs:sequence (dfdl:sequenceKind is 'unordered')
3. An optional xs:element
4. An array xs:element.
5. All xs:elements in an xs:sequence containing one or more floating xs:elements.
1. should say 'A member of an xs:choice' because it is the member, not
the group itself, that is the point of uncertainty. I think the confusion
has arisen because only one member of a choice group can exist in the data.
So if any member exists, it automatically ends any speculation about the
content of the choice group. But I insist that the real point of uncertainty
is the member. A choice group is always 'known to exist' because according
to DFDL rules it must have minOccurs=maxOccurs=1. FWIW, I have no problem
with talking about 'resolving a choice', provided that we define that as
'Determining which member of a choice group ( if any ) is known to exist
in the data'.
SMH: I agree that it should say member.
2. Should say 'All members of an unordered xs:sequence' to keep the language
consistent with 1. The section on unordered groups clearly restricts members
to elements only.
SMH: No. Using 'xs:element' is consistent with optionals & arrays in
3 and 4, which are also always elements. so xs:element is more consistent.
3. See above - an optional elements is not always a 'point of uncertainty'
according to the literal definition that we are currently using.
SMH; Right, but the bullets are defining potential PoUs, so it is
correct as it stands.
4. Should say 'An optional occurrence of an array element, unless the separator
properties make it a positional array and the occurrence is required in
the data'
SMH: No. All occurrences can be PoUs, it depends on OCK. And separators
do not resolve PoUs as noted. This definition 4 is the one that is key
for Action 248, which is ultimately what led to this discussion and what
needs to be resolved. The question is whether 4 should say a) all
arrays are potential PoUs as it does now, or b) just some arrays
are potential PoUs depending on OCK. Whatever we choose, a discriminator
within that array must not leak beyond the array as explained below
in bold red font.
I think a) is clearer and we can then make a general statement about discriminators
not leaking outside of any potebtial PoU. If we adopt b) then we need a
separate statement about discriminators and arrays, which seems more bitty.
5. Should say 'All members...' for consistency.
SMH: See 3.
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Steve
Hanson/UK/IBM@IBMGB
To: ,
Date: 13/05/2014
10:28
Subject: [DFDL-WG]
Fw: Action 248 (was Thoughts on a discriminator scenario)
Sent by: dfdl-wg-bounces@ogf.org
This will be discussed on today's call. Please have a position on the paragraph
below that ends 'What do others think?'
Thanks
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 13/05/2014 10:19 -----
From: Steve
Hanson/UK/IBM
To: Tim
Kimber/UK/IBM@IBMGB,
Cc: dfdl-wg@ogf.org
Date: 30/04/2014
12:25
Subject: Re:
[DFDL-WG] Action 248 (was Thoughts on a discriminator scenario)
Tim
Responses below.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Tim
Kimber/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 11/04/2014
14:03
Subject: Re:
[DFDL-WG] Action 248 (was Thoughts on a discriminator scenario)
Sent by: dfdl-wg-bounces@ogf.org
"2. If a potential point of uncertainty is sometimes
an actual point of uncertainty (ock 'implicit') then a discriminator that
applies it will only ever resolve, or have no effect on, that point of
uncertainty. It never has an effect on any enclosing point of uncertainty."
This could be misinterpreted. The discriminator could evaluate to 'false'
and thus cause the POI to be resolved negatively ( the component would
be 'known not to exist' )
SMH: Agree, and I can improve the words here.
1. and 3. will both apply if an element with ock='fixed' appears as a choice
branch. Is the POI always an actual POI or never?
SMH: No. There are two independent points of uncertainty, the choice branch
and the array.
The wording of 3. reads very strangely. 'If a potential point of uncertainty
is never an actual point of uncertainty' begs the question 'why
is it even a potential point of uncertainty?'. The current wording
follows from our definition of the term 'point of uncertainty':
"A point of uncertainty occurs in the data stream when there is
more than one schema component
that might occur at that point." Points of uncertainty can
be nested.
Any one of the following constructs is a potential point of uncertainty:
1. An xs:choice
2. All xs:elements in an unordered xs:sequence (dfdl:sequenceKind is 'unordered')
3. An optional xs:element
4. An array xs:element.
5. All xs:elements in an xs:sequence containing one or more floating xs:elements.
I think this definition is too broad. It forces us to discuss potential
POUs that will never be actual POUs according to the first sentence.
SMH: Yes it does read a bit strangely, but there's a reason for this. If
we said that ock 'fixed', 'expression' or 'stopValue' are never POUs then
what does it mean if a discriminator is placed on such an element?
A discriminator gets evaluated for each occurrence of an array. For that
reason we can not let a discriminator within an array leak beyond the array
- regardless of whether it is a POU or not - otherwise what does that mean
to enclosing POUs? So even if we said that ock 'fixed', 'expression' or
'stopValue' are never POUs we would still need the spec to state that a
discriminator never leaks beyond them. I think it is clearer to say that
a discriminator never leaks beyond a potential POU and keep the existing
definition. What do others think?
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Steve
Hanson/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 11/04/2014
11:44
Subject: Re:
[DFDL-WG] Action 248 (was Thoughts on a discriminator scenario)
Sent by: dfdl-wg-bounces@ogf.org
248
| Discriminators
and potential points of uncertainty (Steve)
28/1: Steve to write up a proposal to prevent a discriminator from behaving
in a non-obvious manner when used with a potential point of uncertainty
that turns out not to be an actual point of uncertainty.
5/2: Steve sent an email to check whether choice branches, unordered elements
and floating elements should always be actual points of uncertainty, as
there are times when there is no uncertainty, eg, last choice branch; all
floating elements found. It was decided that they are always actual points
of uncertainty. To do otherwise will complicate implementations and result
in fragile schemas. Steve will proceed with the proposal on that basis. |
Based on the above, which reflects the
email discussion below, here is what I propose to resolve this action.
1. If a potential point of uncertainty is always
an actual point of uncertainty (choice branch, element in unordered sequence,
floating element, ock 'parsed') then a discriminator that applies to it
will only ever resolve that point of uncertainty. It never has an effect
on any enclosing point of uncertainty.
2. If a potential point of uncertainty is sometimes
an actual point of uncertainty (ock 'implicit') then a discriminator that
applies it will only ever resolve, or have no effect on, that point of
uncertainty. It never has an effect on any enclosing point of uncertainty.
3. If a potential point of uncertainty is never
an actual point of uncertainty (ock 'fixed', 'expression', 'stopValue')
then a discriminator that applies to it will never have an effect on that
point of uncertainty. Nor does it ever have an effect on any enclosing
point of uncertainty.
I think 1 and 2 are not controversial, but there is an alternative for
3:
3. If a potential point
of uncertainty is never an actual point of uncertainty (ock 'fixed',
'expression', 'stopValue') then a discriminator that applies to it will
never have an effect on that point of uncertainty. Instead the discriminator
is applied to any enclosing point of uncertainty.
The alternative means that changing an
element from (say) ock 'parsed' to ock 'expression' has the same effect
on a discriminator as changing the element to (1,1). The discriminator
that applied to it now applies to any enclosing pou.
SMH: Afternote: The alternative
does not work for the reason given in my reply to Tim above.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Steve
Hanson/UK/IBM
To: Tim
Kimber/UK/IBM@IBMGB,
Cc: dfdl-wg@ogf.org,
dfdl-wg-bounces@ogf.org
Date: 05/02/2014
12:04
Subject: Re:
[DFDL-WG] Action 248 (was Thoughts on a discriminator scenario)
Thanks Tim, all good points. Comments
to your comments.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Tim
Kimber/UK/IBM
To: Steve
Hanson/UK/IBM@IBMGB,
Cc: dfdl-wg@ogf.org,
dfdl-wg-bounces@ogf.org
Date: 05/02/2014
11:01
Subject: Re:
[DFDL-WG] Action 248 (was Thoughts on a discriminator scenario)
A couple of comments below.
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Steve
Hanson/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 05/02/2014
10:50
Subject: [DFDL-WG]
Action 248 (was Thoughts on a discriminator scenario)
Sent by: dfdl-wg-bounces@ogf.org
248
| Discriminators
and potential points of uncertainty (Steve)
28/1: Steve to write up a proposal to prevent a discriminator from behaving
in a non-obvious manner when used with a potential point of uncertainty
that turns out not to be an actual point of uncertainty.
5/2: With Steve |
I started on this by reading section
9.3.3 on points of uncertainty, which lists the potential PoUs. Here's
the list to save getting the spec out.
1. An xs:choice
branch
2. All xs:elements
in an unordered xs:sequence (dfdl:sequenceKind is 'unordered')
3. An optional
xs:element
4. An array xs:element
5. All xs:elements
in an xs:sequence containing one or more floating xs:elements.
The section then looks at each in turn
and gives the circumstances when it is an actual PoU or not. As currently
written, it is only 3 and 4 where a potential PoU might not be an actual
PoU. For 1, 2 and 5 it says they are always actual PoUs.
But I'm not sure that's correct. A deeper
analysis of what is actually going on with 1, 2 and 5 says to me that there
are times when there might not be an actual PoU.
1. Given that there is no concept in
DFDL of optional choice branches, then if the last branch is reached then
there is no longer a PoU. It must be that branch else it is a processing
error.
TK: I think of it slightly differently. It is a PoU, even
if the branch is the only remaining branch. If we say that the final choice
branch is not a PoU then diagnostics become confused - the parser reports
the error code as 'error while parsing root/choice/lastBranch/field1' when
the correct error code would be 'none of the branches of root/choice were
found in the data'.
SMH: I see your point. My thinking was that
choices have finite branches and a choice is (1,1). If I have got to the
last branch then I am not one of the other branches so I must be
this one. If there is any other possibility then the model is missing a
branch, even if it is just one that contains an empty sequence with an
assert {fn:false()}. In practice of course users forget to add that last
branch (there's no XSDL equivalent to the 'default' branch of a switch/case
statement), so yes they could end up with an unclear diagnostic.
2. There can come a point in an unordered
sequence when all that can be encountered is one element, and if that is
(1,1) then there is no longer a PoU.
TK: It's still a PoU. The specification says that occursCountKind
is 'parsed' for all members of an unordered group, so min/maxOccurs do
not come into play.
SMH: Interesting. The spec says that if a member
is optional or an array then it must be 'parsed'. If it is (1,1) though
it does not have an occursCountKind. The specific case I was thinking of
is when all members are (1,1), so when you have one element to go
there is no PoU. However, the rewrite into a repeating choice has the effect
of making everything 'parsed', which is really the point you are
making. So I agree with you, it is easier to say that everything is an
actual PoU else it complicates the rewrite semantic.
5. If all floating elements are (1,1)
and all are encountered, then from that point on there are no longer any
PoUs due to floating elements.
TK: I suspect that floating elements are somewhat like
unordered branches - most users will not want min/maxOccurs to affect the
parsing of the group. Schema validation ( or more complex validation applied
in the receiving application ) will deal with non-conformances.
SMH: Possibly yes. With something like X12 NTE
segments, that is the case. But we don't express the floating semantic
as a rewrite of the whole sequence like we do for unordered, it's more
of a per element thing. And if that is done dynamically as we go through
the sequence, having no PoU can result.
I'd like us to get straight on this before
I proceed with the action proper.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 05/02/2014 10:12 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 27/01/2014
17:39
Subject: Fw:
Thoughts on a discriminator scenario
Been thinking some more on the discriminator scenario below that I mailed
out before xmas, and discussing it with the IBM DFDL team.
The 'confusing' aspect of the behaviour is that a discriminator within
a potential PoU will act on a higher level PoU if the potential PoU is
not an actual PoU. In the example, the array element 'Type1' is not an
actual PoU for occurrence 1, only for occurrences 2+. So when the discriminator
fires for occurrence 1 it will resolve a higher level unresolved PoU if
one exists.
Perhaps the spec should say that a discriminator can't 'leak' beyond the
potential PoU that encloses it ? If so, then for occurrence 1 the discriminator
has no effect, and only has an effect for occurrences 2+. This makes
for more predictable and robust schemas.
We'd need to go through spec section 9.3.3 carefully to see if this does
not break any of the potential PoUs that are listed.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 16/01/2014 09:55 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 20/12/2013
13:20
Subject: Thoughts
on a discriminator scenario
Take the following schema (simplified) for element Type1 (1,10) being a
loop for elements A,B,C. Type 1 does not have an initiator so I need
to use a discriminator to establish the existence of an occurrence of Type1
so that incorrect backtracking does not occur after an error. Because occursCountKind
is 'implicit', the 1st occurrence is not a point of uncertainty
so the discriminator acts instead on any enclosing point of uncertainty,
but for 2nd and subsequent occurrences it acts on Type1. That is
all working as designed, but I think users find will the 1st occurrence
behaviour a bit confusing. There are workarounds to avoid the problem,
eg, use occursCountKind 'parsed' or split Type1 into two as (1,1) and (0,9).
I think this is worth documenting in a tutorial as this is quite subtle
stuff.
<xs:element name="Type1" maxOccurs="10"
dfdl:occursCountKind="implicit">
<dfdl:discriminator
test="{fn:exists(A)}" />
<xs:complexType>
<xs:sequence>
<xs:element name="A" dfdl:initiator="A:"
... />
<xs:element name="B" dfdl:initiator="B:"
... />
<xs:element name="C" dfdl:initiator="C:"...
/>
</xs:sequence>
</xs:complexType>
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU