There are 4 functions which operate on the infoset and it is unclear their behavior depending on when they are evaluated during parse/unparse.

The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.

The behavior when unparsing is less problematic, because one could simply require the infoset nodes being referenced to be fully-constructed before these functions are allowed to evaluate.

However, when parsing the behavior is more subtle, and unparsing may want to be made consistent with decisions about behavior for parsing.

Our call minutes about this action item suggest reviewing the known-to-exist and known-not-to-exist definitions to see whether these function definitions should be defined in terms of that. I have reviewed those sections, and so far I'm not sure they will contribute.

The general problem is this, in terms of fn:count(path). The path is to an infoset node or an array of occurrences that is currently being parsed. It is possible that the status of known-to-exist or not is simply not well known at the point the expression is being evaluated.

The answer to fn:count(path) wants to always be the same as if the infoset were fully constructed at the time the expression is evaluated. As evaluation may occur during parsing, it is just not defined if the evaluation of the expression itself determines whether the item itself is known to exist or not.

Ex:

<xs:element name="outerArray" maxOccurs="unbounded">

<xs:complexType>

<xs:sequence>

<xs:element name="innerArray" maxOccurs="unbounded">

<xs:complexType>

<xs:sequence>

<xs:element name="count" type="xs:int" dfdl:inputValueCalc='{ fn:count(../..) }'/>

....

</xs:sequence>

</x:complexType>

</xs:element>

</xs:sequence>

</xs:complexTYpe>

</xs:element>

In the above, we see that fn:count has as argument a relative path to the array element named "outerArray".

There are a few observations here.

1) If we define fn:count in this case to actually have anything to do with the current number of array elements in outerArray, then we will have tightly constrained implementations to a very sequential notion of parsing. The notion of "current" state of the array implies an algorithm where the number of current occurrences is changing. E.g., we would preclude an implementation that knows the length of all outerArray elements from parsing all the children simultaneously in parallel, or at minimum make this quite hard to achieve because each parallel computation would have to somehow simulate the right "current number" of occurrences.

2) The question arises of fn:count(../..) vs. fn:count(../../../outerArray), vs. fn:count(../../../outerArray[i]) where i is the index of the enclosing parent outerArray instance that contains this calculation. Arguably, fn:count(../..) could be considered equivalent to fn:count(../../../outerArray[i]), both of which seem like they should always return '1' since the count of number of instances of a single index point, single node, is 1.

3) Arguably, fn:count(path) could be illegal whenever the path is to an enclosing element. We could simply define this usage to be illegal. I cannot come up with any reason to actually need this functionality. When parsing we could require the path argument to be to pre-existing part of the infoset, and when unparsing it would have to be to either pre-existing or later parts of the infoset, but specifically not the current infoset elements. If we make this an SDE, then this would seem to be the conservative design point which preserves our ability to assign a future meaning to this usage, should a need arise.

My recommendation: Expressions evaluated as part of an element parsing or unparsing cannot refer to the count or existence of the current element occurrence being parsed, nor any enclosing element occurrence, nor any enclosing array element.

This would seem to rule out any use of absolute paths in arguments to fn:count, because the root element is not (necessarily) known-to-exist until the entire parse completes successfully. Yet clearly we want to be able to refer to the fn:count of a prior sibling array, and that reference should be able to use either a relative or absolute path.

So it's not that the argument path "passes through" a node that may or may not exist, but that it ends on one that the existence or not of which doesn't depend on the existence or not of the current node.

I'm a bit uncertain of good language to express this constraint on what the path argument is allowed to refer to, but the notion is one of a sort of circular definition; hence, it's a schema definition error.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com

Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy