My thoughts on this...
The existing choice branch rule that
says minOccurs must not be 0 should remain, for consistency with not allowing
minOccurs 0 on the choice itself.
Choice branch with dfdl:occursCountKind
'expression' should be allowed. If the expression resolves to 0 then there
are no occurrences and the branch is missing, so the parser looks for the
next branch. This preserves the rule that a branch must exist.
Choice branch with dfdl:occursCountKind
'parsed' should be allowed. If the parser does not find any occurrences
then the branch is missing, so the parser looks for the next branch. This
preserves the rule that a branch must exist.
dfdl;inputValueCalc on a choice branch
should be allowed. If the parser reaches such a branch, it discriminates
the choice and no further branches are examined.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
11/08/2015 15:58
Subject:
Re: [DFDL-WG]
Action 280 minOccurs='0' choice branch (was: Re: OCK expression and count
of 0 for a choice member....)
I may have thought of the reason. If
I have a choice of A and B, then minOccurs=0 for B allows the choice to
be empty A|B? but this is the same as (A|B)? which is allowing the
choice itself to be minOccurs=0, which is not allowed.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
18/06/2015 10:49
Subject:
Re: [DFDL-WG]
Action 280 minOccurs='0' choice branch (was: Re: OCK expression and count
of 0 for a choice member....)
Hi Mike
I think the restriction of having minOccurs
>= 1 on xs:choice branch arose for two reasons, though I am unable to
find a definitive email trail:
a) If minOccurs = 0 you immediately
have two points of uncertainty, so potentially two discriminators are needed.
I'm not sure if this is really a problem though, because if minOccurs <
maxOccurs there are also two points of uncertainty and it still requires
some thought to get discrimination correct as it varies per occurrence.
b) Interaction with known-to-exist rules.
For example, one way to achieve known-to-exist is to successfully parse
an empty representation, which with minOccurs = 0 may mean that nothing
is added to the infoset. I'm not sure this is actually a problem
though. If the branch was successfully parsed then surely that should discriminate
in favour of the branch regardless of representation.
And even if a) and b) are problematic,
the fact exists that you can trivially negate the restriction by wrapping
in xs:sequence.
So I suspect we can drop the restriction
altogether, and the 'system' just works in a consistent manner.
You raised the issue of an element with
dfdl:inputValueCalc not being allowed as a choice branch. I suspect this
was added because as soon as you encounter such as branch you have by definition
discriminated in favour of that branch. But that's ok, you just make that
branch the last in the choice. No different to having a branch that exists
just to throw an error - it too must be last. If such branches are not
last, it's a schema design bug.
Back to Alex's original scenario at
the foot of this thread, where his xs:choice branch element had a dfdl:occursCount
expression that evaluated to 0. According to https://redmine.ogf.org/issues/244
no occurrences are looked for in the data. That means the occurrences are
missing, so known-not-to-exist and the parser should try the next branch.
Below I said that section 15.1.1 needed updating to correctly reflect
section 9. And I also said we are perhaps missing a definition of what
'missing' means for an array element?
"(The) spec defines known-to-exist
and known-not-to-exist in terms of occurrences. In (Alex's) choice branch
example, it is the element as a whole we are looking at. That's fine for
scalar as element == occurrence but for an array it's not the same.
I think the spec is missing a definition of what 'missing' means for an
array element. I would say that an array element is missing if all occurrences
are missing. And an array element is not missing if any occurrence has
a representation (empty, nil, normal)."
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
02/06/2015 18:41
Subject:
[DFDL-WG] Action
280 minOccurs='0' choice branch (was: Re: OCK expression and count of 0
for a choice member....)
Sent by:
dfdl-wg-bounces@ogf.org
I believe this action item remains open still and I would
like to revive the discussion.
I was coding up this aspect of Daffodil and have hit this subject head
on.
In section 15 the spec clearly states that the root of
a choice branch cannot be optional, that is cannot have minOccurs="0".
That language is very specific, and it leaves open the
possibility of "effectively optional" things being the roots
of choice branches (e.g., using OCK 'parsed' or 'expression')
It also allows one to trivially wrap a sequence (having
no delimiters, alignment or skips) around an element (or element ref) carrying
minOccurs="0" so as to simply dodge the restriction.
It was observed in the thread below that we cannot require
choice branches to be scalar elements as there is a need for hidden groups
to be branches of choices, and
for empty sequences carrying only asserts, as another non-element example.
Related: the DFDL spec also specifies that an element that is the root
of a choice branch cannot carry dfdl:inputValueCalc. The spec does NOT
restrict use of dfdl:outputValueCalc on the root of a choice branch, but
the meaning of such is unclear to me.
The existing restriction of "no minOccurs="0"
on the root of a choice branch seems not to accomplish anything. It is
only for occursCountKind='implicit' where this can be meaningful it seems.
Requiring the root of a choice branch to not be "variable occurrence"
if it is an element would accomplish something, but it is not clear this
is needed to eliminate ambiguity or if the ambiguity can be eliminated
without any restriction.
The stable design points I can think of are:
1) root of a choice branch must be scalar (so, only a
sequence, choice, or an element where minOccurs == maxOccurs == 1.)
2) root of a choice branch cannot be optional - for a
broad sense of the word optional - precludes arrays with OCK expression
and parsed, and implicit if minOccurs="0". Fixed length arrays
would be allowed.
3) a choice branch must have some syntax
I think we discarded (3) because choice branches that
really just reflect error checking - contain only dfdl:asserts for example
- are in use and serve a useful purpose.
Daffodil's test suite has much use of choice branches that look like this:
<choicie>
.....
<sequence>
<element name="foo" dfdl:inputValueCalc="{....}"/>
</sequence>
</choice>
These have no syntax. This allowing a kind of default-element
to be computed. In most (could be all, I've not searched exhaustively)
of these cases the IVC expression is a constant. But note that the
sequence wrapped around the IVC element is just dodging the restriction
that a choice branch cannot be an IVC element (which is another restriction
that seems unnecessary.)
...mike
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Mon, Apr 27, 2015 at 9:30 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
Mike
A couple of comments:
1) You said below
Optional here means "not required by the DFDL format", as in
occursCountKind cannot be 'parsed' at all, because all occurrences are
then not required, and the min/maxOccurs are only examined for validation
purposes, also occursCountKind cannot be 'implicit' for the same reasons,
and occursCountKind 'expression' also.
OccursCountKind 'implicit' is allowed, because minOccurs is used for parsing
and micOccurs can not be 0.
2) You said below
Wrapping the array element in a sequence doesn't solve the problem unless
the sequence has a required piece of syntax such as an initiator or terminator,
or a hiddenGroupRef to a not-optional (recursively) thing.
A sequence has minOccurs '1' so it does satisfy the spec rule about the
child of a choice being required. Such a sequence could have no syntax
and could contain an element with minOccurs '0' or even be empty. I have
seen DFDL schemas that contain a choice with the last branch being an empty
sequence that contains an assert fn:false() in order to throw a processing
error.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Alex
Wood1/UK/IBM@IBMGB
Cc: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date: 27/04/2015
13:35
Subject: Re:
[DFDL-WG] OCK expression and count of 0 for a choice member....
Sent by: dfdl-wg-bounces@ogf.org
I believe any use of occursCountKind 'expression' on an element that is
the first element on a branch of a choice should be an SDE.
This is one of the cases where DFDL requires one to introduce an element
that would not be necessary in an ordinary XML schema, but is necessary
because DFDL does not have XML's easily parsed syntax to depend on.
This is my opinion. I think we need to look at whether this restriction
is either
(a) necessary
(b) necessary to avoid excessive complexity in implementations
(c) unnecessary - but is the intention of what is specified already (despite
shortcomings of the prose/description in the spec, which could be corrected.)
(d) an error in the specification
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Mon, Apr 27, 2015 at 5:49 AM, Alex Wood1 <WOODA@uk.ibm.com>
wrote:
Hi Mike,
Can you clarify if you are saying that OCK expression should be prohibited
completely on a choice member (as occurrences for OCK expression are potentially
optional regardless of minOccurs value)
Or is your statement that it should cause an SDE specific to the count==0
case?
Kind Regards,
- Alex
Alex Wood -
Software Engineer -
WebSphere Message Broker Development
DFDL Development
MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
Tel: Internal 246272, External 01962 816272
Notes: Alex Wood1/UK/IBM@IBMGB
e-mail: wooda@uk.ibm.com
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Alex
Wood1/UK/IBM@IBMGB
Date: 24/04/2015
15:10
Subject: Re:
[DFDL-WG] OCK expression and count of 0 for a choice member....
I think this is an SDE.
Choice branches cannot be optional.
Optional here, does not mean minOccurs == 0, because for many occursCountKinds,
that's never checked unless validation is on, and validation doesn't guide
parsing anyway.
Optional here means "not required by the DFDL format", as in
occursCountKind cannot be 'parsed' at all, because all occurrences are
then not required, and the min/maxOccurs are only examined for validation
purposes, also occursCountKind cannot be 'implicit' for the same reasons,
and occursCountKind 'expression' also.
Wrapping the array element in a sequence doesn't solve the problem unless
the sequence has a required piece of syntax such as an initiator or terminator,
or a hiddenGroupRef to a not-optional (recursively) thing.
Even initiator and terminator are tricky, because in a non-delimited format,
those can be %WSP*; which can match nothing at all; hence, they do not
"require" any syntax.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Fri, Apr 24, 2015 at 9:07 AM, Alex Wood1 <WOODA@uk.ibm.com>
wrote:
Hi All,
Please see below for a history of the issue.
This arose from fuzz testing of the IBM DFDL parser which produced a test
with a coutn of 0 for an OCK expression array which was a choice
member. And subsequent reference to the specification.
It was not clear what the correct outcome should be in a choice where the
first member is an array using OCK expression where the count resolves
to 0.
a.) resolve the choice to the zero length array
b.) move to the next choice branch
c.) throw an error
Kind Regards,
- Alex
Alex Wood -
Software Engineer -
WebSphere Message Broker Development
DFDL Development
MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
Tel: Internal 246272, External 01962 816272
Notes: Alex Wood1/UK/IBM@IBMGB
e-mail: wooda@uk.ibm.com
From: Steve
Hanson/UK/IBM
To: Alex
Wood1/UK/IBM@IBMGB
Cc: Andrew
Edwards/UK/IBM@IBMGB, Mark Frost/UK/IBM
Date: 24/04/2015
09:19
Subject: Re:
OCK expression and count of 0 for a choice member....
When I wrote the paragraph below, the one thing that troubled me was that
the spec defines known-to-exist and known-not-to-exist in terms of occurrences.
In the choice branch example, it is the element as a whole we are looking
at. That's fine for scalar as element == occurrence but for an array it's
not the same. I think the spec is missing a definition of what 'missing'
means for an array element. I would say that an array element is missing
if all occurrences are missing. And an array element is not missing if
any occurrence has a representation (empty, nil, normal). With that
in place, my paragraph makes sense, I think.
I believe we have the same issue with 'parsed' and 'stopValue'.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Steve
Hanson/UK/IBM
To: Alex
Wood1/UK/IBM@IBMGB
Cc: Andrew
Edwards/UK/IBM@IBMGB, Mark Frost/UK/IBM@IBMGB
Date: 23/04/2015
18:52
Subject: Re:
OCK expression and count of 0 for a choice member....
Here is one interpretation...
A choice is resolved by parsing the branches until one is known-to-exist
as described in section 9.3.3. Section 9.3.1.2 defines known-to-exist
(in the absence of a discriminator, initiator or direct dispatch) as an
occurrence having empty, nil or normal representation. Section 9.3.1.3
defines known-not-to-exist (again in the absence of a discriminator, initiator
or direct dispatchm or an assert) as an occurrence being missing or causing
a processing error. If occursCount is zero no occurrences are looked for
in the data (erratum 5.9) so the element has no representation and must
be missing. Therefore a choice branch containing such an element is known-not-to-exist.
So in your example, the first choice branch containing myInt is known-not-to-exist
and the parser tries the next branch.
This appears to contradict section 15.1.1 though. I suspect that 15.1.1
was not updated to match section 9.3 when the latter was added.
If you want to make the first choice branch known-to-exist when the count
is zero then I think wrapping myInt in a sequence would work. Or wrapping
myInt in a complex element.
Definitely one to take to the WG though, if only to correct section 15.1.1
to match section 9.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Alex
Wood1/UK/IBM
To: Steve
Hanson/UK/IBM@IBMGB
Cc: Andrew
Edwards/UK/IBM@IBMGB, Mark Frost/UK/IBM@IBMGB
Date: 23/04/2015
16:33
Subject: OCK
expression and count of 0 for a choice member....
Hi Steve
Just been discussing this with Andy and Mark.
I think the spec
<xs:element name="Choice_Expression" dfdl:ref="config"
dfdl:lengthKind="implicit">
<xs:complexType>
<xs:sequence dfdl:ref="config">
<xs:element ref="myCount"></xs:element>
<xs:choice
dfdl:choiceLengthKind="implicit" dfdl:ref="config">
<xs:element
ref="myInt" minOccurs="1" maxOccurs="3"></xs:element>
<xs:element ref="myTxt"></xs:element>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
Where myInt
has occursCountKind="expression"
occursCount="{../myCount}"
A given instance of this message could have myCount==0
Is this valid?
Should it resolve to 0 occurrences of myInt or move on to myTxt ?
Section15 of the spec says:
The Root of the Branch MUST NOT be optional. That is XSDL minOccurs MUST
BE greater than 0.
But in this case minOccurs is >0.
Assuming this is not an error then in terms of resolving the choice section
15.1.1 says..
15.1.1 Resolving Choices via Speculation Speculative resolution works as
follows:
1) Attempt to parse the first branch of the choice.
2) If this fails with a processing error
a) If a dfdl:discriminator evaluated to true earlier on this branch then
the parser is 'bound' to this branch and parsing of the entire choice construct
fails with a processing error.
b) If the branch has a dfdl:initiator and the choice has dfdl:initiatedContent
‘yes’ then the parser is 'bound' to this branch and parsing of the entire
choice construct fails with a processing error. c) Otherwise we repeat
from step 1 for the next branch of the choice.
3) It is a processing error if the branches of the choice are exhausted.
4) If a branch is successfully parsed without error, then that branch's
infoset becomes the infoset for the parse of the choice construct.
So seems like this is 4.) we did not fail to parse myInt...
However talking with mark about real scenarios that this might apply to,
a choice two repeating fields with counts earlier in the data only one
of which must appear. you'd expect 0 of the first means >0 of the second
and visa versa... So you'd probably want 0 myInt allowed the choice to
resolve to myTxt.
Thoughts ?
If you agree we need more clarity in he spec will forward to WG.
Kind Regards,
- Alex
Alex Wood -
Software Engineer -
WebSphere Message Broker Development
DFDL Development
MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
Tel: Internal 246272, External 01962 816272
Notes: Alex Wood1/UK/IBM@IBMGB
e-mail: wooda@uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU