Ideally we would want to say that path
locations that can never result in a nodeset of exactly one are schema
definition errors, and any remaining runtime errors are processing errors.
But the former is not easy to establish, and there are grey areas. Eg,
an element uses a path that goes outside of its global containing object.
Eg, an element uses a path that refers to an element that is declared multiple
times but each one is optional.
XPath 2.0 defines a set of error codes.
Let's look at these and see whether it is possible to carve these up between
schema definition error and processing error.
Mike will see whether XPath 2.0 autocasts
its result to match the expected context, where possible to do so.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org,
Date:
22/10/2012 17:23
Subject:
Re: [DFDL-WG]
Action 188 - Path expressions, empty node sequences,
and errors
Sent by:
dfdl-wg-bounces@ogf.org
Comments summarized from the WG call on 2012-10-22
IBM commented that its implementation is checking that path expressions
return only a single node, and not no nodes or multiple nodes.
It is proposed that an existing XPath implementation could be used by a
DFDL implementation, but not without some effort to:
(a) analyze expressions so as to statically detect malformed paths or paths
that are known to return no or multiple (not one) node as SDE.
(b) impose the semantics of fn:exactly-one on other paths at processing
time.
Issue: is (b) an SDE or a PE?
Further question (not from the call, but for discussion): do DFDL expressions
automatically take on type? E.g.,
<dfdl:discriminator>true</dfdl:discriminator> versus <dfdl:discriminator>xs:boolean("true")</dfdl:discriminator>
...mike
On Wed, Oct 3, 2012 at 6:51 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com>
wrote:
Issue: what is semantics of a path expression returning
an empty node sequence.
Current spec language says it behaves as if it returned nil.
This isn't well formed. nil is not an empty node sequence it's a
special reserved value. This definition is neither consistent with XPath
(which lets functions decide what the behavior for empty node sequence
is depending on the function), nor consistent with use of nil elsewhere
in DFDL.
Discussion:
Possible changes
1) Any path expression that evaluates to empty node sequence causes an
SDE
2) ditto except PE
3) XPath consistent - let the functions decide. So for string functions,
an empty node sequence could be treated as "" as in XPath. An
empty node sequence returned as the value of a DFDL Infoset item would
depend on the type of the infoset item. For a string it could be "",
for a boolean it could be false, etc.
4) ANything else?
It is very desirable that they should be schema definition errors because
the most likely usage pattern is to create a relative path reaching to
a part of the structure that is supposed to exist unconditionally. Since
DFDL path expressions are a first order language (meaning you can't construct
a path from a string), the DFDL compiler can find the vast majority of
Path mistakes (misspelling a path step name for example, or wrong number
of "../.." steps in a relative path), all at compile time and
issue SDEs for them. The cases where a path might or might not exist will
be far more rare.
However, there is the issue of deep embedding of a path inside an expression.
If we want a DFDL processor to be XPath compatible (roughly), and to be
able to be implemented by reusing an XPath implementation, then there is
the problem that the DFDL implementation reuses the XPath implementation
as a black box, and it does not get to see the path expressions that return
empty node sequences unless they are returned to it from the XPath evaluator.
An XPath implementation embedded inside a DFDL implementation would happily
evaluate concat( path1, path2) and if path1 turned out to be empty node
sequence, it would get "" for that, and the DFDL implementation
might not have any way to intercept this to implment the more rigorous
semantics that issues an SDE (or even a PE).
Adopting XPath semantics entirely makes things like concat(../a/complete/nonsense/path,
"foobar") into valid code. The path may be meaningless, but that
means it will just be treated as "".
Suggested Solution:
We can, however, have our cake and eat it too.
Assume we embed an ordinary XPath semantics inside DFDL (choice 3 above).
Implementors embed XPath implementations black-box.
In this case I believe we badly need the fn:exactly-one(arg) function in
the DFDL library so that one can wrap it around almost every path expression
to get a processing error if it is not one node, and we need to add a dfdl:nodePath(arg)
function (the name 'nodePath' meaning 'is expected to be a path to just
one node' - entertain a different name if you prefer) which is the same,
but issues an SDE and suggests to the implementation that it should be
checked before runtime.
This would let a cautious DFDL schema author wrap path expressions with
fn:exactly-one or dfdl:nodePath to get the strong checking and behaviour
they want.
This is tedious, but gives us XPath compatibility and ease of implementation.
Details:
There is the below implication for the spec, among others:
In the spec our function signatures use '?' after parameter or return type
for expression language functions means they can be either a single value
or the empty sequence.
If we decide these paths cannot be empty node sequences, then these ? all
must be removed. If we decide they can be empty node sequences, then we
must specify behavior of each function when empty sequence is the argument.
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU