A separator as something that applies to
the entire group, so I'm uncomfortable with the idea of (potentially )
changing it for every member of the group.
So I would vote for:
1) The separator is evaluated once per 'data' element;
occursIndex
evaluates to index in the 'data' array;
If 2) was desired it could be achieved
by setting the terminator on num:
<element name="e2">
<sequence separator="|" separatorPosition="infix">
<element name="seps" minOccurs="3"
maxOccurs="3"/>
<element name="data" maxOccurs='10'>
<sequence>
<element name="num"
maxOccurs='10' terminator="{ /e2/seps[dfdl:occursIndex()] }"
/>
</sequence>
</element>
</sequence>
</element>
..and the infix-ness could be emulated
by setting the terminator to "" when dfdl:occursIndex()
eq count( /e2/seps).
regards,
Tim Kimber,
Technical Lead for IBM Integration Bus Healthcare Pack
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Cc:
Norm Patrick <npatrick@tresys.com>,
Jessie Chab <jchab@tresys.com>
Date:
16/12/2014 22:40
Subject:
[DFDL-WG] when
is the separator expression evaluated?
Sent by:
dfdl-wg-bounces@ogf.org
Jessie Chab came up with this interesting case. I am hoping
someone else remembers somewhere in the spec where this order of evaluation
issue is taken up in detail.
Consider:
<element name="e2">
<sequence separator="|" separatorPosition="infix">
<element name="seps" minOccurs="3"
maxOccurs="3"/>
<element name="data" maxOccurs='10'>
<sequence separator="{ /e2/seps[dfdl:occursIndex()]
}">
<element name="num"
maxOccurs='10' />
</sequence>
</element>
</sequence>
</element>
So we first parse 3 strings separated by a pipe. After that's parsed,
lets assume our infoset looks like this:
<e2>
<seps>;</seps>
<seps>-</seps>
<seps>#</seps>
</e2>
After that we will have some 'data' elements (separated by pipes) which
each have a sequence of 'num' elements. The question is what are the
valid separators of the 'num' elements. I see two potential interpretations.
1) The separator is evaluated once per 'data' element; occursIndex
evaluates to index in the 'data' array; valid data might look something
like:
;|-|#|a;b;c;d|e-f-g-h|i#j#k#l
Note that this means the size of the data array must be less than or
equal to the size of the seps array (though that could be worked around
using mod 3 arithmetic.)
2) Everytime we need to look for a separator between a num element, we
reevaluate the separator expression. This means the occursIndex()
references the index in the 'num' array, and so valid data might look
something like:
;|-|#|a;b-c#d|e;f-g#h|i;j-k#l
Note that this means the size of the num array must be less than or
equal to the size of the seps array.
I recall we were considering an argument to dfdl:occursIndex()
to make exactly this kind of issue clear. I believe we decided against
it, as we weren't able to pin down the semantics quite clearly. E.g.,
in the above, how would you add an argument to the dfdl:occursIndex(...)
call that points to the num array, which isn't even in scope at that point?
I know we say somewhere in the spec that separator can
be defined, in say, the default format of some other schema file. It can
be an expression, and that expression isn't evaluated until some sequence
which has that separator in scope. Which means the expression can refer
to path steps and such that are meaningless at the point where it appears
lexically, but will be meaningful for a sequence where that separator expression
is in scope.
But this problem is slightly different. The question is whether the evaluation
is per-item of the sequence, or once for the sequence.
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU