For today's call.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 16/10/2012 13:12 -----
From:
Steve Hanson/UK/IBM
To:
Tim Kimber/UK/IBM@IBMGB
Cc:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Date:
01/10/2012 15:31
Subject:
Re: [DFDL-WG]
assert and discriminator - no more before/after
My comments:
i) Implementations
are free to optimize evaluation of discriminators so as to improve performance
and diagnostic capability by analyzing the actual discriminator test expressions
and evaluating them earlier. Should
state that this is allowed only if the outcome of the early evaluation
will always give the same result as evaluation after parsing.
ii) Need to expand to include discriminators
on sequences, choices and group refs.
iii) In the case of a processing error,
is the discriminator evaluated at the point the processing error occurs,
or after rollback has occurred? I think it must be before rollback, else
the discriminator may fail when it would have otherwise passed.
iv) (TBD:
??? is this right ??). The issue
here is which processing error 'wins'. I think the discriminator wins here.
v) Need to expand to include asserts?
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Tim Kimber/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
Steve Hanson/UK/IBM@IBMGB
Date:
28/09/2012 15:44
Subject:
Re: [DFDL-WG]
assert and discriminator - no more before/after
One comment : the term 'this component'
could be understood to refer to either
a) the component on which the discriminator
is positioned or
b) the component that is the nearest
enclosing point of uncertainty
I think b) is the intended meaning.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB,
Tim Kimber/UK/IBM@IBMGB,
Date:
28/09/2012 15:26
Subject:
Re: [DFDL-WG]
assert and discriminator - no more before/after
Revised to use standard spec terminology about speculative
parse behavior. That is, known to exist or known not to exist.
--------------------------------------------------------------
Proposal: Evaluation Time for Discriminators
A discriminator annotation on an element declaration or element reference
is evaluated after the element is parsed.
This evaluation occurs regardless of whether the parsing of the corresponding
element instance ends with or without a processing error.
In the case where parsing ends without error, evaluation of the discriminator
then occurs and there are 2 possible outcomes
1. discriminator
evaluates to true - The nearest enclosing point of uncertainty is resolved,
and this component is known to exist.
- (NOTE: keep in mind in the above that this
element might be one of several in a sequence making up the alternative
(it is not necessarily just a single root to the alternative), so just
because this discriminator is being evaluated after its element, there
may still be more parsing to do within that alternative. So this point
of uncertainty can be resolved, but the parse could still encounter a processing
error later. )
2.
discriminator evaluates
to false, or a processing error occurs during evaluation of the discriminator
- the nearest enclosing point of uncertainty is resolved and this component
is known not to exist.
- Any discriminator-caused processing error
is the cause of error in this case. Diagnostics would refer to this as
the cause of the error if there are no remaining alternatives in the enclosing
points of uncertainty.
In the case
of a processing error, evaluation of the discriminator controls the way
that processing error is handled. There are 3 possible outcomes
1. discriminator
evaluates to true - the nearest enclosing point of uncertainty is resolved,
and this component is determined to be known to exist. As a result, the
processing error applies to the next outward enclosing point of uncertainty
(if any).
2. discriminator
evaluates to false - the nearest enclosing point of uncertainty is resolved,
and this component is determined to be known NOT to exist.
3. discriminator
evaluation causes a processing error - the processing error (the one from
the discriminator evaluation) causes the nearest enclosing point of uncertainty
to be resolved. The component is known NOT to exist.
- The discriminator-caused processing error
is the cause of error in this case. Diagnostics would refer to this as
the cause of the error if there are no remaining alternatives in the enclosing
points of uncertainty.
- (TBD: ??? is this right ??)
Implementations
are free to optimize evaluation of discriminators so as to improve performance
and diagnostic capability by analyzing the actual discriminator test expressions
and evaluating them earlier.
On Mon, Sep 17, 2012 at 9:58 AM, Mike Beckerle <mbeckerle.dfdl@gmail.com>
wrote:
Thanks for this.
So, an expression on a discriminator wants to be able to be placed somewhere
syntactically such that it is clear that it will apply to a specific point
of uncertainty. Yet it then must reference forward into the structure that
would normally not have yet been created/processed at that point, for example
to look at a tag field.
I see the issue now. The problem is that all these forward references must
have their points of uncertainty resolved, and then we can evaluate the
expression without being fooled about whether something temporarily doesn't
exist or not.
...mikeb
On Mon, Sep 17, 2012 at 9:15 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
Some thoughts so far...added to agenda
for next WG call.
I think the IBM implementation was done that way to handle descendent references
and their implications, and to compensate for the dropping of the 'timing'
attribute. It uses a notify mechanism to speed up the point where a discriminator
can be evaluated, and hence backtrack earlier. I believe it could have
been implemented by evaluating at fixed points - evaluate straight away
if no reference to self or descendent; evaluate when finished component
if reference to self or descendent; evaluate if processing error occurs
within component.
I have been back through spec revisions, draft documents on speculative
parsing, emails and WG minutes, to see what we have discussed previously:
- The 'timing' attribute on asserts and discriminators existed in spec
038. This was dropped in spec 039 when the following was added: "The
expression is evaluated when the referenced elements are known to exist
or known not to exist.". This
was Tim's suggestion.
- Later on in spec 042 the 'timing' attribute was dropped from asserts
too, and the wording for both changed to "Any
element referred to by the expression must have already been processed
or is a descendent of this element. The expression must have been evaluated
by the time this element and it descendents have been processed."
plus additionally for discriminators "or
when a processing error occurs when processing this element or its descendents".
- The motivating use case for supporting forward references to descendents
is as follows: "For the most common use case (choice resolution)
the most natural place for a DFDL author to place the discriminators is
on the root of the branch. This may well involve a forward reference into
the branch content. If we disallow this, you are forcing the author
to place the discriminator in the branch content, which might be in a separate
global element, perhaps in another schema". And the problem
with doing the latter is the rule that says a discriminator resolves the
nearest active point of uncertainty. If you place a discriminator inside
a global element then it will get evaluated at all points of use - with
potentially undesirable results. So you place your discriminator as close
to the point of uncertainty as possible.
- Your dead-simple suggestion leaves the above choice scenario open to
the problem stated.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB
Cc: dfdl-wg@ogf.org
Date: 15/09/2012
20:09
Subject: Re:
[DFDL-WG] assert and discriminator - no more before/after
That is a remarkably complex implementation.
I am a bit concerned.... is it correct? Here's the scenario I'm worried
about: When an expression is evaluated, some node set results for internal
sub-expression results might be empty node sets. But there's no positive
way to tell this happened because some part of the infoset had 'not yet'
been filled in, because we cannot predict the future. So an expression
could successfully evaluate, just produced a different value than it would
if evaluated later.
It seems to me some of this is inherent in XPath and the node-set-as-result
model. An example of this might be a discriminator with an expression that
evaluates to true if some subtree does NOT exist. At the start of the associated
element, that tree doesn't exist, so you get empty node set back when you
examine it, discriminator expression successfully evaluates to true. Later,
when recursing into children of the element in question you add that sub-tree.
After the element is complete the answer to the discriminator expression
would have been false.
Anyway, in the daffodil project we're trying to reuse an XPath implementation
(Saxon-B), and create as our infoset trees in the JDOM object model.
So the above strategy, even if it works, isn't available to us anyway because
we're trying to use an expression evaluator that does call us back for
variable access, but expects us to hand it a JDOM tree as the infoset,
and it accesses that tree at will while evaluating the expression. We don't
have intercept capability to the inquiries on the infoset/jdom.
Other implementations might also prefer simpler implementation techniques,
even at the expense of efficiency.
So the question becomes are there simpler strategies for implementing the
DFDL expressions on asserts/discriminators?
I have a dead-simple suggestion that might work.
1. an
assert/discrim which annotates an element behaves as if evaluated after
the element's value is computed, true whether simple or complex type.
2. an assert/discrim which
annotates a sequence or choice behaves as if evaluated before any child
of the sequence/choice.
Seems very simple. (Almost too simple?) I'm not sure it
is correct, but it seems workable so far to me, in that you can control
before/after by syntactic placement. This might have some impact on the
model structure, but there are other things in DFDL that do as well.
Is there some obvious flaw I'm missing?
...mike
On Fri, Sep 14, 2012 at 5:14 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
Hi Mike
I believe the rule is that an assert/discriminator expression is evaluated
"as soon as it can be". In IBM DFDL, we try to evaluate an expression
when it is encountered, and if it can't be evaluated at that time because
it references element(s) that do not appear in the infoset, the parser
continues but the expression manager registers an interest in the element(s).
The expression manager gets notified when the element(s) appear in the
infoset and the expression is re-evaluated at that time. If the parser
gets to the end of the scope for the assert/discriminator and it is still
not evaluated, it is an error. (Tim please correct any mis-information).
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: dfdl-wg@ogf.org
Date: 13/09/2012
16:32
Subject: [DFDL-WG]
assert and discriminator - no more before/after
Sent by: dfdl-wg-bounces@ogf.org
I am looking in the spec for guidance about the evaluation order of assert
statements.
We used to have before/after control properties, but eliminated them.
If I annotate a simpleType'd element with an assert that says { . eq 'x'
}, that of necessity references the current value, so must execute after
the value has been computed.
If on the other hand I annotate a complexType element with a discriminator
that says { ../flag eq 'C1' } then this of necessity must execute before
I go after the contents because the whole point is to evalutate the discriminator
first.
Did we ever articulate exactly what the rules are here about order of evaluation?
Thanks for reminders
..mikeb
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU