Action 315 ... IBM DFDL and Daffodil do not have any tests of significance that use self/parent, for the set of affected functions. Proposal is to make such usage a SDE. Say now if this is a problem.

Regards

Steve Hanson

IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve Lawrence <slawrence@tresys.com>
Cc: DFDL-WG <dfdl-wg@ogf.org>
Date: 06/11/2020 21:23
Subject: [EXTERNAL] Re: [DFDL-WG] Action 315: fn:count(.), fn:exists(.)
Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org>

Yes I think that would be disallowed.

I think dfdl:occursIndex() is the function to call to decide if you are at index 1 or not.

However, we only have dfdl:occursIndex() defined for the innermost array. There's no way to ask for the current index of an enclosing array of the nest.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy

On Thu, Nov 5, 2020 at 4:07 PM Steve Lawrence <slawrence@tresys.com> wrote:
I know of uses where fn:count has been used as a way to keep a running
sum via inputValueCalc. For example:

<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="array" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="val" type="xs:int" ... />
<xs:element name="sum" type="xs:int"
dfdl:inputValueCalc="{
if (fn:count(../../array) eq 1)
then ../val
else ../../array[fn:count(../../array) - 1]/sum
}" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="total" type="xs:int"
dfdl:inputValueCalc="{ ../array[fn:count(../array)]/sum }" />
</xs:sequence>
</xs:complexType>
</xs:element>

Would something like this no longer be allowed under this proposal?

On 11/5/20 3:42 PM, Mike Beckerle wrote:
> There are 4 functions which operate on the infoset and it is unclear their
> behavior depending on when they are evaluated during parse/unparse.
>
> The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.
>
> The behavior when unparsing is less problematic, because one could simply
> require the infoset nodes being referenced to be fully-constructed before these
> functions are allowed to evaluate.
>
> However, when parsing the behavior is more subtle, and unparsing may want to be
> made consistent with decisions about behavior for parsing.
>
> Our call minutes about this action item suggest reviewing the known-to-exist and
> known-not-to-exist definitions to see whether these function definitions should
> be defined in terms of that. I have reviewed those sections, and so far I'm not
> sure they will contribute.
>
> The general problem is this, in terms of fn:count(path). The path is to an
> infoset node or an array of occurrences that is currently being parsed. It is
> possible that the status of known-to-exist or not is simply not well known at
> the point the expression is being evaluated.
>
> The answer to fn:count(path) wants to always be the same as if the infoset were
> fully constructed at the time the expression is evaluated. As evaluation may
> occur during parsing, it is just not defined if the evaluation of the expression
> itself determines whether the item itself is known to exist or not.
>
> Ex:
>
> <xs:element name="outerArray" maxOccurs="unbounded">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="innerArray" maxOccurs="unbounded">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="count" type="xs:int" dfdl:inputValueCalc='{
> fn:count(../..) }'/>
> ....
> </xs:sequence>
> </x:complexType>
> </xs:element>
> </xs:sequence>
> </xs:complexTYpe>
> </xs:element>
>
> In the above, we see that fn:count has as argument a relative path to the array
> element named "outerArray".
>
> There are a few observations here.
>
> 1) If we define fn:count in this case to actually have anything to do with the
> current number of array elements in outerArray, then we will have tightly
> constrained implementations to a very sequential notion of parsing. The notion
> of "current" state of the array implies an algorithm where the number of current
> occurrences is changing. E.g., we would preclude an implementation that knows
> the length of all outerArray elements from parsing all the children
> simultaneously in parallel, or at minimum make this quite hard to achieve
> because each parallel computation would have to somehow simulate the right
> "current number" of occurrences.
>
> 2) The question arises of fn:count(../..) vs. fn:count(../../../outerArray), vs.
> fn:count(../../../outerArray[i]) where i is the index of the enclosing parent
> outerArray instance that contains this calculation. Arguably, fn:count(../..)
> could be considered equivalent to fn:count(../../../outerArray[i]), both of
> which seem like they should always return '1' since the count of number of
> instances of a single index point, single node, is 1.
>
> 3) Arguably, fn:count(path) could be illegal whenever the path is to an
> enclosing element. We could simply define this usage to be illegal. I cannot
> come up with any reason to actually need this functionality. When parsing we
> could require the path argument to be to pre-existing part of the infoset, and
> when unparsing it would have to be to either pre-existing or later parts of the
> infoset, but specifically not the current infoset elements. If we make this an
> SDE, then this would seem to be the conservative design point which preserves
> our ability to assign a future meaning to this usage, should a need arise.
>
> My recommendation: Expressions evaluated as part of an element parsing or
> unparsing cannot refer to the count or existence of the current element
> occurrence being parsed, nor any enclosing element occurrence, nor any enclosing
> array element.
>
> This would seem to rule out any use of absolute paths in arguments to fn:count,
> because the root element is not (necessarily) known-to-exist until the entire
> parse completes successfully. Yet clearly we want to be able to refer to the
> fn:count of a prior sibling array, and that reference should be able to use
> either a relative or absolute path.
>
> So it's not that the argument path "passes through" a node that may or may not
> exist, but that it ends on one that the existence or not of which doesn't depend
> on the existence or not of the current node.
>
> I'm a bit uncertain of good language to express this constraint on what the path
> argument is allowed to refer to, but the notion is one of a sort of circular
> definition; hence, it's a schema definition error.
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
> www.owlcyberdefense.com <http://www.owlcyberdefense.com>
> Please note: Contributions to the DFDL Workgroup's email discussions are subject
> to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>
>
>
> --
> dfdl-wg mailing list
> dfdl-wg@ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
>

--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU