Fw: assert and discriminator - no more before/after - dfdl-wg

16 Oct 2012

      For today's call.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 16/10/2012 13:12 -----

From:   Steve Hanson/UK/IBM
To:     Tim Kimber/UK/IBM@IBMGB
Cc:     Mike Beckerle <mbeckerle.dfdl@gmail.com>
Date:   01/10/2012 15:31
Subject:        Re: [DFDL-WG] assert and discriminator - no more 
before/after

My comments:

i) Implementations are free to optimize evaluation of discriminators so as 
to improve performance and diagnostic capability by analyzing the actual 
discriminator test expressions and evaluating them earlier.  Should state 
that this is allowed only if the outcome of the early evaluation will 
always give the same result as evaluation after parsing.

ii) Need to expand to include discriminators on sequences, choices and 
group refs.

iii) In the case of a processing error, is the discriminator evaluated at 
the point the processing error occurs, or after rollback has occurred? I 
think it must be before rollback, else the discriminator may fail when it 
would have otherwise passed. 

iv) (TBD: ??? is this right ??). The issue here is which processing error 
'wins'. I think the discriminator wins here. 

v) Need to expand to include asserts? 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From:   Tim Kimber/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:     Steve Hanson/UK/IBM@IBMGB
Date:   28/09/2012 15:44
Subject:        Re: [DFDL-WG] assert and discriminator - no more 
before/after

One comment : the term 'this component' could be understood to refer to 
either
a) the component on which the discriminator is positioned or
b) the component that is the nearest enclosing point of uncertainty

I think b) is the intended meaning. 

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert@uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:   Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:     Steve Hanson/UK/IBM@IBMGB, Tim Kimber/UK/IBM@IBMGB, 
Date:   28/09/2012 15:26
Subject:        Re: [DFDL-WG] assert and discriminator - no more 
before/after

Revised to use standard spec terminology about speculative parse behavior. 
That is, known to exist or known not to exist. 

--------------------------------------------------------------

Proposal: Evaluation Time for Discriminators

A discriminator annotation on an element declaration or element reference 
is evaluated after the element is parsed. 

This evaluation occurs regardless of whether the parsing of the 
corresponding element instance ends with or without a processing error. 

In the case where parsing ends without error, evaluation of the 
discriminator then occurs and there are 2 possible outcomes
1.      discriminator evaluates to true - The nearest enclosing point of 
uncertainty is resolved, and this component is known to exist.
(NOTE: keep in mind in the above that this element might be one of several 
in a sequence making up the alternative (it is not necessarily just a 
single root to the alternative), so just because this discriminator is 
being evaluated after its element, there may still be more parsing to do 
within that alternative. So this point of uncertainty can be resolved, but 
the parse could still encounter a processing error later. )
2.      discriminator evaluates to false, or a processing error occurs 
during evaluation of the discriminator - the nearest enclosing point of 
uncertainty is resolved and this component is known not to exist.
Any discriminator-caused processing error is the cause of error in this 
case. Diagnostics would refer to this as the cause of the error if there 
are no remaining alternatives in the enclosing points of uncertainty.
In the case of a processing error, evaluation of the discriminator 
controls the way that processing error is handled. There are 3 possible 
outcomes
1.      discriminator evaluates to true - the nearest enclosing point of 
uncertainty is resolved, and this component is determined to be known to 
exist. As a result, the processing error applies to the next outward 
enclosing point of uncertainty (if any).
2.      discriminator evaluates to false - the nearest enclosing point of 
uncertainty is resolved, and this component is determined to be known NOT 
to exist. 
3.      discriminator evaluation causes a processing error - the 
processing error (the one from the discriminator evaluation) causes the 
nearest enclosing point of uncertainty to be resolved. The component is 
known NOT to exist.
The discriminator-caused processing error is the cause of error in this 
case. Diagnostics would refer to this as the cause of the error if there 
are no remaining alternatives in the enclosing points of uncertainty.
(TBD: ??? is this right ??)
Implementations are free to optimize evaluation of discriminators so as to 
improve performance and diagnostic capability by analyzing the actual 
discriminator test expressions and evaluating them earlier.

On Mon, Sep 17, 2012 at 9:58 AM, Mike Beckerle <mbeckerle.dfdl@gmail.com> 
wrote:
Thanks for this. 

So, an expression on a discriminator wants to be able to be placed 
somewhere syntactically such that it is clear that it will apply to a 
specific point of uncertainty. Yet it then must reference forward into the 
structure that would normally not have yet been created/processed at that 
point, for example to look at a tag field.

I see the issue now. The problem is that all these forward references must 
have their points of uncertainty resolved, and then we can evaluate the 
expression without being fooled about whether something temporarily 
doesn't exist or not. 

...mikeb

On Mon, Sep 17, 2012 at 9:15 AM, Steve Hanson <smh@uk.ibm.com> wrote:
Some thoughts so far...added to agenda for next WG call. 

I think the IBM implementation was done that way to handle descendent 
references and their implications, and to compensate for the dropping of 
the 'timing' attribute. It uses a notify mechanism to speed up the point 
where a discriminator can be evaluated, and hence backtrack earlier. I 
believe it could have been implemented by evaluating at fixed points - 
evaluate straight away if no reference to self or descendent; evaluate 
when finished component if reference to self or descendent; evaluate if 
processing error occurs within component. 

I have been back through spec revisions, draft documents on speculative 
parsing, emails and WG minutes, to see what we have discussed previously: 

- The 'timing' attribute on asserts and discriminators existed in spec 
038. This was dropped in spec 039 when the following was added: "The 
expression is evaluated when the referenced elements are known to exist or 
known not to exist.". This was Tim's suggestion. 

- Later on in spec 042 the 'timing' attribute was dropped from asserts 
too, and the wording for both changed to "Any element referred to by the 
expression must have already been processed or is a descendent of this 
element. The expression must have been evaluated by the time this element 
and it descendents have been processed." plus additionally for 
discriminators "or when a processing error occurs when processing this 
element or its descendents". 

- The motivating use case for supporting forward references to descendents 
is as follows:  "For the most common use case (choice resolution) the most 
natural place for a DFDL author to place the discriminators is on the root 
of the branch. This may well involve a forward reference into the branch 
content.  If we disallow this, you are forcing the author to place the 
discriminator in the branch content, which might be in a separate global 
element, perhaps in another schema".  And the problem with doing the 
latter is the rule that says a discriminator resolves the nearest active 
point of uncertainty. If you place a discriminator inside a global element 
then it will get evaluated at all points of use - with potentially 
undesirable results. So you place your discriminator as close to the point 
of uncertainty as possible. 

- Your dead-simple suggestion leaves the above choice scenario open to the 
problem stated. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848 

From:        Mike Beckerle <mbeckerle.dfdl@gmail.com> 
To:        Steve Hanson/UK/IBM@IBMGB 
Cc:        dfdl-wg@ogf.org 
Date:        15/09/2012 20:09 
Subject:        Re: [DFDL-WG] assert and discriminator - no more 
before/after 

That is a remarkably complex implementation.

I am a bit concerned.... is it correct? Here's the scenario I'm worried 
about: When an expression is evaluated, some node set results for internal 
sub-expression results might be empty node sets. But there's no positive 
way to tell this happened because some part of the infoset had 'not yet' 
been filled in, because we cannot predict the future. So an expression 
could successfully evaluate, just produced a different value than it would 
if evaluated later.  

It seems to me some of this is inherent in XPath and the 
node-set-as-result model. An example of this might be a discriminator with 
an expression that evaluates to true if some subtree does NOT exist. At 
the start of the associated element, that tree doesn't exist, so you get 
empty node set back when you examine it, discriminator expression 
successfully evaluates to true. Later, when recursing into children of the 
element in question you add that sub-tree. After the element is complete 
the answer to the discriminator expression would have been false.

Anyway, in the daffodil project we're trying to reuse an XPath 
implementation (Saxon-B), and create as our infoset trees in the JDOM 
object model. 

So the above strategy, even if it works, isn't available to us anyway 
because we're trying to use an expression evaluator that does call us back 
for variable access, but expects us to hand it a JDOM tree as the infoset, 
and it accesses that tree at will while evaluating the expression. We 
don't have intercept capability to the inquiries on the infoset/jdom. 

Other implementations might also prefer simpler implementation techniques, 
even at the expense of efficiency. 

So the question becomes are there simpler strategies for implementing the 
DFDL expressions on asserts/discriminators? 

I have a dead-simple suggestion that might work. 
1.        an assert/discrim which annotates an element behaves as if 
evaluated after the element's value is computed, true whether simple or 
complex type. 
2.        an assert/discrim which annotates a sequence or choice behaves 
as if evaluated before any child of the sequence/choice. 
Seems very simple. (Almost too simple?) I'm not sure it is correct, but it 
seems workable so far to me, in that you can control before/after by 
syntactic placement. This might have some impact on the model structure, 
but there are other things in DFDL that do as well. 

Is there some obvious flaw I'm missing?

...mike

On Fri, Sep 14, 2012 at 5:14 AM, Steve Hanson <smh@uk.ibm.com> wrote: 
Hi Mike 

I believe the rule is that an assert/discriminator expression is evaluated 
"as soon as it can be". In IBM DFDL, we try to evaluate an expression when 
it is encountered, and if it can't be evaluated at that time because it 
references element(s) that do not appear in the infoset, the parser 
continues but the expression manager registers an interest in the 
element(s). The expression manager gets notified when the element(s) 
appear in the infoset and the expression is re-evaluated at that time. If 
the parser gets to the end of the scope for the assert/discriminator and 
it is still not evaluated, it is an error. (Tim please correct any 
mis-information). 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848 

From:        Mike Beckerle <mbeckerle.dfdl@gmail.com> 
To:        dfdl-wg@ogf.org 
Date:        13/09/2012 16:32 
Subject:        [DFDL-WG] assert and discriminator - no more before/after 
Sent by:        dfdl-wg-bounces@ogf.org 

I am looking in the spec for guidance about the evaluation order of assert 
statements.

We used to have before/after control properties, but eliminated them.

If I annotate a simpleType'd element with an assert that says { . eq 'x' 
}, that of necessity references the current value, so must execute after 
the value has been computed.

If on the other hand I annotate a complexType element with a discriminator 
that says { ../flag eq 'C1' } then this of necessity must execute before I 
go after the contents because the whole point is to evalutate the 
discriminator first.

Did we ever articulate exactly what the rules are here about order of 
evaluation?

Thanks for reminders

..mikeb

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412 
--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Fw: assert and discriminator - no more before/after

Steve Hanson

tags

participants (1)