
Comments summarized from the WG call on 2012-10-22 IBM commented that its implementation is checking that path expressions return only a single node, and not no nodes or multiple nodes. It is proposed that an existing XPath implementation could be used by a DFDL implementation, but not without some effort to: (a) analyze expressions so as to statically detect malformed paths or paths that are known to return no or multiple (not one) node as SDE. (b) impose the semantics of fn:exactly-one on other paths at processing time. Issue: is (b) an SDE or a PE? Further question (not from the call, but for discussion): do DFDL expressions automatically take on type? E.g., <dfdl:discriminator>true</dfdl:discriminator> versus <dfdl:discriminator>xs:boolean("true")</dfdl:discriminator> ...mike On Wed, Oct 3, 2012 at 6:51 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com>wrote:
Issue: what is semantics of a path expression returning an empty node sequence.
Current spec language says it behaves as if it returned nil.
This isn't well formed. nil is not an empty node sequence it's a special reserved value. This definition is neither consistent with XPath (which lets functions decide what the behavior for empty node sequence is depending on the function), nor consistent with use of nil elsewhere in DFDL.
*Discussion:*
Possible changes 1) Any path expression that evaluates to empty node sequence causes an SDE 2) ditto except PE 3) XPath consistent - let the functions decide. So for string functions, an empty node sequence could be treated as "" as in XPath. An empty node sequence returned as the value of a DFDL Infoset item would depend on the type of the infoset item. For a string it could be "", for a boolean it could be false, etc. 4) ANything else?
It is very desirable that they should be schema definition errors because the most likely usage pattern is to create a relative path reaching to a part of the structure that is supposed to exist unconditionally. Since DFDL path expressions are a first order language (meaning you can't construct a path from a string), the DFDL compiler can find the vast majority of Path mistakes (misspelling a path step name for example, or wrong number of "../.." steps in a relative path), all at compile time and issue SDEs for them. The cases where a path might or might not exist will be far more rare.
However, there is the issue of deep embedding of a path inside an expression. If we want a DFDL processor to be XPath compatible (roughly), and to be able to be implemented by reusing an XPath implementation, then there is the problem that the DFDL implementation reuses the XPath implementation as a black box, and it does not get to see the path expressions that return empty node sequences unless they are returned to it from the XPath evaluator.
An XPath implementation embedded inside a DFDL implementation would happily evaluate concat( path1, path2) and if path1 turned out to be empty node sequence, it would get "" for that, and the DFDL implementation might not have any way to intercept this to implment the more rigorous semantics that issues an SDE (or even a PE).
Adopting XPath semantics entirely makes things like concat(../a/complete/nonsense/path, "foobar") into valid code. The path may be meaningless, but that means it will just be treated as "". * Suggested Solution:*
We can, however, have our cake and eat it too.
Assume we embed an ordinary XPath semantics inside DFDL (choice 3 above). Implementors embed XPath implementations black-box.
In this case I believe we badly need the fn:exactly-one(arg) function in the DFDL library so that one can wrap it around almost every path expression to get a processing error if it is not one node, and we need to add a dfdl:nodePath(arg) function (the name 'nodePath' meaning 'is expected to be a path to just one node' - entertain a different name if you prefer) which is the same, but issues an SDE and suggests to the implementation that it should be checked before runtime.
This would let a cautious DFDL schema author wrap path expressions with fn:exactly-one or dfdl:nodePath to get the strong checking and behaviour they want.
This is tedious, but gives us XPath compatibility and ease of implementation.
*Details:*
There is the below implication for the spec, among others:
In the spec our function signatures use '?' after parameter or return type for expression language functions means they can be either a single value or the empty sequence.
If we decide these paths cannot be empty node sequences, then these ? all must be removed. If we decide they can be empty node sequences, then we must specify behavior of each function when empty sequence is the argument.
-- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412
-- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412
-- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412