Re: [DFDL-WG] Action 188 - Path expressions, empty node sequences, and errors

22 Oct 2012

      Comments summarized from the WG call on 2012-10-22

IBM commented that its implementation is checking that path expressions
return only a single node, and not no nodes or multiple nodes.

It is proposed that an existing XPath implementation could be used by a
DFDL implementation, but not without some effort to:

(a) analyze expressions so as to statically detect malformed paths or paths
that are known to return no or multiple (not one) node as SDE.
(b) impose the semantics of fn:exactly-one on other paths at processing
time.

Issue: is (b) an SDE or a PE?

Further question (not from the call, but for discussion): do DFDL
expressions automatically take on type? E.g.,
<dfdl:discriminator>true</dfdl:discriminator> versus
<dfdl:discriminator>xs:boolean("true")</dfdl:discriminator>

...mike

On Wed, Oct 3, 2012 at 6:51 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com>wrote:
...
Issue: what is semantics of a path expression returning an empty node
sequence.
Current spec language says it behaves as if it returned nil.
This isn't well formed. nil is not an empty node sequence it's  a special
reserved value. This definition is neither consistent with XPath (which
lets functions decide what the behavior for empty node sequence is
depending on the function), nor consistent with use of nil elsewhere in
DFDL.
*Discussion:*
Possible changes
1) Any path expression that evaluates to empty node sequence causes an SDE
2) ditto except PE
3) XPath consistent - let the functions decide. So for string functions,
an empty node sequence could be treated as "" as in XPath. An empty node
sequence returned as the value of a DFDL Infoset item would depend on the
type of the infoset item. For a string it could be "", for a boolean it
could be false, etc.
4) ANything else?
It is very desirable that they should be schema definition errors because
the most likely usage pattern is to create a relative path reaching to a
part of the structure that is supposed to exist unconditionally. Since DFDL
path expressions are a first order language (meaning you can't construct a
path from a string), the DFDL compiler can find the vast majority of Path
mistakes (misspelling a path step name for example, or wrong number of
"../.." steps in a relative path), all at compile time and issue SDEs for
them. The cases where a path might or might not exist will be far more rare.
However, there is the issue of deep embedding of a path inside an
expression. If we want a DFDL processor to be XPath compatible (roughly),
and to be able to be implemented by reusing an XPath implementation, then
there is the problem that the DFDL implementation reuses the XPath
implementation as a black box, and it does not get to see the path
expressions that return empty node sequences unless they are returned to it
from the XPath evaluator.
An XPath implementation embedded inside a DFDL implementation would
happily evaluate concat( path1, path2) and if path1 turned out to be empty
node sequence, it would get "" for that, and the DFDL implementation might
not have any way to intercept this to implment the more rigorous semantics
that issues an SDE (or even a PE).
Adopting XPath semantics entirely makes things like
concat(../a/complete/nonsense/path, "foobar") into valid code. The path may
be meaningless, but that means it will just be treated as "".
*
Suggested Solution:*
We can, however, have our cake and eat it too.
Assume we embed an ordinary XPath semantics inside DFDL (choice 3 above).
Implementors embed XPath implementations black-box.
In this case I believe we badly need the fn:exactly-one(arg) function in
the DFDL library so that one can wrap it around almost every path
expression to get a processing error if it is not one node, and we need to
add a dfdl:nodePath(arg) function (the name 'nodePath' meaning 'is expected
to be a path to just one node' - entertain a different name if you prefer)
which is the same, but issues an SDE and suggests to the implementation
that it should be checked before runtime.
This would let a cautious DFDL schema author wrap path expressions with
fn:exactly-one or dfdl:nodePath to get the strong checking and behaviour
they want.
This is tedious, but gives us XPath compatibility and ease of
implementation.
*Details:*
There is the below implication for the spec, among others:
In the spec our function signatures use '?' after parameter or return type
for expression language functions means they can be either a single value
or the empty sequence.
If we decide these paths cannot be empty node sequences, then these ? all
must be removed. If we decide they can be empty node sequences, then we
must specify behavior of each function when empty sequence is the argument.
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412