
Hi Mike I would go along with the restrictions in Section 35. They were deliberate. It's an SDE if fn:count() target is not optional or array. I can make the call today, so we can discuss further then if necessary. Regards Steve On Tue, Jan 9, 2024 at 8:58 PM Mike Beckerle <mbeckerle@apache.org> wrote:
The description of fn:count in section 18.5.2.5 says fn:count() can be called on a node-sequence. It does not seem to require an array/optional path. This seems to be suggesting that if you have say....
<element name="b" type="xs:int"/> <element name="a" type="xs:int"/> <element name="b" type="xs:int" minOccurs="0"/>
That one could write fn:count(b) and it should return 1 or 2. The Daffodil project calls this a "query-style" expression, as opposed to just a basic path expression.
Furthermore, in the first paragraph of section 18.5.2.5 it says "(Note that DFDL v1.0 does not support sequences of length > 1 as the final results of expressions.)" which suggests that some expressions can constructively return node sequences of length > 1 as intermediate results. Presumably fn:count(b) above is such a situation, as the expression 'b' returns 2 nodes.
The above all suggests to me that fn:count(a) would be legal where 'a' is a scalar. This is just always 1 of course.
But Section 35 says:
Expression value is not single node
§ Most DFDL expression contexts require an expression to identify a single node, not an array (aka sequence of nodes). There are a few exceptions such as the fn:count(…) function, where the path expression must be to an array or optional element.
o Expression value is not array element or optional element.
§ Some DFDL expression contexts require an array or an optional element.
§ Example: The fn:count(...) function argument must be to an array or optional element. It is a Schema Definition Error if the argument expression is otherwise.
Experience at the Daffodil project is that allowing fn:count argument expression to be a non-array non-optional element, where fn:count would always return 1, just hides errors that are very hard to find, and this situation comes up often as a schema is written. Usually the expression to fn:count is initially correct with an array/optional as the argument, but element nesting evolves, and the paths need updating, but end up referring not to the array/optional element, but that name is now of a scalar enclosing element of the array, so the fn:count is always 1, and the schema is incorrect because the expression is not doing what is intended, but no error is detected. This is then quite hard to isolate and fix.
A concrete example of this experience is you start with a schema like:
<element name="record" maxOccurs="unbounded"> <complexType> <sequence> .... elements of the record
But then you need the valueLength of the whole array of all the records, to store the length for unparsing, so you revise this to:
<element name="record"> <complexType> <sequence> <element name="item" maxOccurs="unbounded"/> <complexType> <sequence> .... elements of each record 'item'.
And now, paths you had like fn:count(foo/bar/record) are no longer to an array, they are to a scalar, so always return 1. This is decidedly unhelpful in a large schema. It is far better if fn:count(foo/bar/record) becomes an SDE because record is now scalar.
So the clarification I'm seeking is whether section 35 was just missed when updates were made about this node-sequence stuff, or if it is reasonable to implement the restrictions in Section 35.
I am biased. I want the restrictions in Section 35, but this was muddy enough that I thought we should get a clarification first.
Daffodil already doesn't implement any query-style expressions so the fn:count(b) example above would be an SDE in Daffodil.
Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com
-- dfdl-wg mailing list dfdl-wg@lists.ogf.org https://lists.ogf.org/mailman/listinfo/dfdl-wg
-- Regards Steve