Action 315 ... IBM DFDL and Daffodil do
not have any tests of significance that use self/parent, for the set of
affected functions. Proposal is to make such usage a SDE. Say
now if this is a problem.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Lawrence <slawrence@tresys.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
06/11/2020 21:23
Subject:
[EXTERNAL] Re:
[DFDL-WG] Action 315: fn:count(.), fn:exists(.)
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
Yes I think that would be disallowed.
I think dfdl:occursIndex() is the function to call to
decide if you are at index 1 or not.
However, we only have dfdl:occursIndex() defined for the
innermost array. There's no way to ask for the current index of an enclosing
array of the nest.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber
Defense | www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Thu, Nov 5, 2020 at 4:07 PM Steve Lawrence <slawrence@tresys.com>
wrote:
I know of uses where fn:count has been used as a way to
keep a running
sum via inputValueCalc. For example:
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="array" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="val"
type="xs:int" ... />
<xs:element name="sum"
type="xs:int"
dfdl:inputValueCalc="{
if (fn:count(../../array)
eq 1)
then ../val
else ../../array[fn:count(../../array)
- 1]/sum
}" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="total" type="xs:int"
dfdl:inputValueCalc="{ ../array[fn:count(../array)]/sum
}" />
</xs:sequence>
</xs:complexType>
</xs:element>
Would something like this no longer be allowed under this proposal?
On 11/5/20 3:42 PM, Mike Beckerle wrote:
> There are 4 functions which operate on the infoset and it is unclear
their
> behavior depending on when they are evaluated during parse/unparse.
>
> The 4 functions are fn:count, fn:exists, fn:exactly-one, and fn:empty.
>
> The behavior when unparsing is less problematic, because one could
simply
> require the infoset nodes being referenced to be fully-constructed
before these
> functions are allowed to evaluate.
>
> However, when parsing the behavior is more subtle, and unparsing may
want to be
> made consistent with decisions about behavior for parsing.
>
> Our call minutes about this action item suggest reviewing the known-to-exist
and
> known-not-to-exist definitions to see whether these function definitions
should
> be defined in terms of that. I have reviewed those sections, and so
far I'm not
> sure they will contribute.
>
> The general problem is this, in terms of fn:count(path). The path
is to an
> infoset node or an array of occurrences that is currently being parsed.
It is
> possible that the status of known-to-exist or not is simply not well
known at
> the point the expression is being evaluated.
>
> The answer to fn:count(path) wants to always be the same as if the
infoset were
> fully constructed at the time the expression is evaluated. As evaluation
may
> occur during parsing, it is just not defined if the evaluation of
the expression
> itself determines whether the item itself is known to exist or not.
>
> Ex:
>
> <xs:element name="outerArray" maxOccurs="unbounded">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="innerArray"
maxOccurs="unbounded">
> <xs:complexType>
> <xs:sequence>
> <xs:element
name="count" type="xs:int" dfdl:inputValueCalc='{
> fn:count(../..) }'/>
> ....
> </xs:sequence>
> </x:complexType>
> </xs:element>
> </xs:sequence>
> </xs:complexTYpe>
> </xs:element>
>
> In the above, we see that fn:count has as argument a relative path
to the array
> element named "outerArray".
>
> There are a few observations here.
>
> 1) If we define fn:count in this case to actually have anything to
do with the
> current number of array elements in outerArray, then we will have
tightly
> constrained implementations to a very sequential notion of parsing.
The notion
> of "current" state of the array implies an algorithm where
the number of current
> occurrences is changing. E.g., we would preclude an implementation
that knows
> the length of all outerArray elements from parsing all the children
> simultaneously in parallel, or at minimum make this quite hard to
achieve
> because each parallel computation would have to somehow simulate the
right
> "current number" of occurrences.
>
> 2) The question arises of fn:count(../..) vs. fn:count(../../../outerArray),
vs.
> fn:count(../../../outerArray[i]) where i is the index of the enclosing
parent
> outerArray instance that contains this calculation. Arguably, fn:count(../..)
> could be considered equivalent to fn:count(../../../outerArray[i]),
both of
> which seem like they should always return '1' since the count of number
of
> instances of a single index point, single node, is 1.
>
> 3) Arguably, fn:count(path) could be illegal whenever the path is
to an
> enclosing element. We could simply define this usage to be illegal.
I cannot
> come up with any reason to actually need this functionality.
When parsing we
> could require the path argument to be to pre-existing part of the
infoset, and
> when unparsing it would have to be to either pre-existing or later
parts of the
> infoset, but specifically not the current infoset elements. If we
make this an
> SDE, then this would seem to be the conservative design point which
preserves
> our ability to assign a future meaning to this usage, should a need
arise.
>
> My recommendation: Expressions evaluated as part of an element parsing
or
> unparsing cannot refer to the count or existence of the current element
> occurrence being parsed, nor any enclosing element occurrence, nor
any enclosing
> array element.
>
> This would seem to rule out any use of absolute paths in arguments
to fn:count,
> because the root element is not (necessarily) known-to-exist until
the entire
> parse completes successfully. Yet clearly we want to be able to refer
to the
> fn:count of a prior sibling array, and that reference should be able
to use
> either a relative or absolute path.
>
> So it's not that the argument path "passes through" a node
that may or may not
> exist, but that it ends on one that the existence or not of which
doesn't depend
> on the existence or not of the current node.
>
> I'm a bit uncertain of good language to express this constraint on
what the path
> argument is allowed to refer to, but the notion is one of a sort of
circular
> definition; hence, it's a schema definition error.
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
> www.owlcyberdefense.com
<http://www.owlcyberdefense.com>
> Please note: Contributions to the DFDL Workgroup's email discussions
are subject
> to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>
>
>
> --
> dfdl-wg mailing list
> dfdl-wg@ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
>
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU