Mike to run the below past the Daffodil
team.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 19/01/2015 17:58 -----
From:
Steve Hanson/UK/IBM
To:
Tim Kimber/UK/IBM@IBMGB
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
19/12/2014 11:51
Subject:
Re: [DFDL-WG]
when is the separator expression evaluated?
Section 16.3 of the spec says:
16.3 Arrays
with DFDL Expressions
If the value of a DFDL property of an array
element (other than dfdl:occursCount) is given by a DFDL Expression, then
the expression must be re-evaluated for each occurrence of the element
in case the value changes.
Relating this to Mike's original question,
I would say that the separator for the sequence within 'data' is re-evaluated
for each occurrence of 'data', but it is not re-evaluated for each
occurrence of 'num'.
The order in which properties are referenced
is given by section 22 of the spec. (I am sure this does not cover every
nuance, but let's assume it does). It should not make a difference if a
property is fixed or an expression; so when a property is referenced the
expression is evaluated. I am happy for implementations to defer the evaluation
of the expression BUT only as long as deferral does not change the result
that would have been obtained if the expression had been evaluated at the
original time of reference.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Tim Kimber/UK/IBM@IBMGB
To:
DFDL-WG <dfdl-wg@ogf.org>
Date:
18/12/2014 16:24
Subject:
Re: [DFDL-WG]
when is the separator expression evaluated?
Sent by:
dfdl-wg-bounces@ogf.org
Good questions. I think the questions
apply equally to separator and terminator, which can both be defined on
the sequence group.
Parsing: The first member may have lengthKind='explicit' and will therefore
not need the separator until the parsing of the first member is complete.
The terminator will be required as soon as the parser has to look for delimiters
in the 'trailing optional' area of the sequence group.
So we need to decide whether
a) DFDL expressions for separators/terminators are evaluated upon entering
the sequence group or
b) DFDL expressions for separators/terminators can be evaluated lazily
or
c) DFDL expressions for separators/terminators must be evaluated lazily
Serializing: The separator will be required after the first member, regardless.
The terminator may be required before the end of the group if one or more
group members have an escape scheme.
I'm inclined to suggest that implementations should be free to evaluate
eagerly or lazily, as long as the behaviour conforms to the DFDL spec.
But there may be scope for conforming implementations to exhibit material
differences in behaviour if we allow that much latitude. I just can't think
what those differences would be.
regards,
Tim Kimber,
Technical Lead for IBM Integration Bus Healthcare Pack
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Tim
Kimber/UK/IBM@IBMGB
Cc: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date: 17/12/2014
18:19
Subject: Re:
[DFDL-WG] when is the separator expression evaluated?
Great. I concur. Anybody have the opposite perspective?
Where should a clarification go?
In general, suppose
<xs:sequence dfdl:terminator="....some expression...">
....
does the expression get evaluated when the xs:sequence is first "entered"
by the parser (whatever "entered" means - when the parser conceptually
walks into this construct of the schema), or as late as possible - when
the terminator is actually needed for something.
Consider - parsing we may need the terminator quite soon, as the terminator
may play a role in delimiting the very first thing one finds inside the
sequence.
When unparsing, if you happen to know there are 5 things in the sequence
from the Infoset, you don't really need the terminator at all until after
you have unparsed the 5th thing, i.e., much later.
This asymetry is of concern.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Wed, Dec 17, 2014 at 10:28 AM, Tim Kimber <KIMBERT@uk.ibm.com>
wrote:
A separator as something that applies to the entire group, so I'm uncomfortable
with the idea of (potentially ) changing it for every member of the group.
So I would vote for:
1) The separator is evaluated once per 'data' element; occursIndex
evaluates to index in the 'data' array;
If 2) was desired it could be achieved by setting the terminator on num:
<element name="e2">
<sequence separator="|" separatorPosition="infix">
<element name="seps" minOccurs="3"
maxOccurs="3"/>
<element name="data" maxOccurs='10'>
<sequence>
<element name="num" maxOccurs='10'
terminator="{ /e2/seps[dfdl:occursIndex()] }" />
</sequence>
</element>
</sequence>
</element>
..and the infix-ness could be emulated by setting the terminator to ""
when dfdl:occursIndex()
eq count( /e2/seps).
regards,
Tim Kimber,
Technical Lead for IBM Integration Bus Healthcare Pack
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Cc: Norm
Patrick <npatrick@tresys.com>,
Jessie Chab <jchab@tresys.com>
Date: 16/12/2014
22:40
Subject: [DFDL-WG]
when is the separator expression evaluated?
Sent by: dfdl-wg-bounces@ogf.org
Jessie Chab came up with this interesting case. I am hoping someone else
remembers somewhere in the spec where this order of evaluation issue is
taken up in detail.
Consider:
<element name="e2">
<sequence separator="|" separatorPosition="infix">
<element name="seps" minOccurs="3"
maxOccurs="3"/>
<element name="data" maxOccurs='10'>
<sequence separator="{ /e2/seps[dfdl:occursIndex()]
}">
<element name="num" maxOccurs='10'
/>
</sequence>
</element>
</sequence>
</element>
So we first parse 3 strings separated by a pipe. After that's parsed,
lets assume our infoset looks like this:
<e2>
<seps>;</seps>
<seps>-</seps>
<seps>#</seps>
</e2>
After that we will have some 'data' elements (separated by pipes) which
each have a sequence of 'num' elements. The question is what are the
valid separators of the 'num' elements. I see two potential interpretations.
1) The separator is evaluated once per 'data' element; occursIndex
evaluates to index in the 'data' array; valid data might look something
like:
;|-|#|a;b;c;d|e-f-g-h|i#j#k#l
Note that this means the size of the data array must be less than or
equal to the size of the seps array (though that could be worked around
using mod 3 arithmetic.)
2) Everytime we need to look for a separator between a num element, we
reevaluate the separator expression. This means the occursIndex()
references the index in the 'num' array, and so valid data might look
something like:
;|-|#|a;b-c#d|e;f-g#h|i;j-k#l
Note that this means the size of the num array must be less than or
equal to the size of the seps array.
I recall we were considering an argument to dfdl:occursIndex() to make
exactly this kind of issue clear. I believe we decided against it, as we
weren't able to pin down the semantics quite clearly. E.g., in the
above, how would you add an argument to the dfdl:occursIndex(...) call
that points to the num array, which isn't even in scope at that point?
I know we say somewhere in the spec that separator can be defined, in say,
the default format of some other schema file. It can be an expression,
and that expression isn't evaluated until some sequence which has that
separator in scope. Which means the expression can refer to path steps
and such that are meaningless at the point where it appears lexically,
but will be meaningful for a sequence where that separator expression is
in scope.
But this problem is slightly different. The question is whether the evaluation
is per-item of the sequence, or once for the sequence.
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU