Yes - I wanted to keep the scope of the
initial question as small as possible, but Steve is right to raise these
related issues.
I think the question about occursCountKind='expression'
is a related but separate issue ( the problems that I raised in relation
to SSP are not dependent on OCK='expression' ). Worth a separate action,
I think.
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
----- Forwarded by Tim
Kimber/UK/IBM on 09/06/2014 11:05 -----
From:
Steve Hanson/UK/IBM@IBMGB
To:
dfdl-wg@ogf.org,
Date:
09/06/2014 10:36
Subject:
Re: [DFDL-WG]
Spec question: Parsing sequence groups with separators
Sent by:
dfdl-wg-bounces@ogf.org
To put Tim's concerns in another way,
the spec defines 'positional sequence' and 'non-positional sequence' in
terms of the value of separatorSuppressionPolicy (section 14.2). But
separatorSuppressionPolicy only applies when occursCountKind is 'implicit',
for other occursCountKinds there is an implied separatorSuppressionPolicy
value (section 14.2.2) . We did this partly so that separatorSuppressionPolicy
can be put in scope and not cause errors. However when you create a sequence
that contains elements with different occursCountKinds, you can end up
with a hybrid which is positional in places and non-positional in others.
We need to decide whether these kind of sequences are allowed. You
can always wrap a group of elements in a sequence in order to change separatorSuppressionPolicy.
occursCountKind 'expression'. This is stated as having implied separatorSuppressionPolicy
'never' on the grounds that is very like 'fixed'. That implies positional
behaviour. But you need to parse the data in order to know the number of
occurrences, so doesn't that make it non-positional? Also. section 16 states
that when unparsing, 'expression' behaves like 'parsed' - and 'parsed'
has implied separatorSuppressionPolicy 'empty'. Something not quite straight
here.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Tim
Kimber/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 07/06/2014
21:31
Subject: [DFDL-WG]
Spec question: Parsing sequence groups with separators
Sent by: dfdl-wg-bounces@ogf.org
The rules outlined in section 14.2.2 'Parsing Sequence Groups with Separators'
are not properly specified, and probably cannot be consistently implemented.
The last paragraph of Section 14.2.1 says this: "In the sections
that follow, it is important to remember that the dfdl:separatorSuppressionPolicy
property is carried on the sequence, while the XSDL minOccurs, XSDL maxOccurs
and dfdl:occursCountKind properties are is carried on an element in that
sequence."
This is true, and this 'local overriding' of separatorSuppressionPolicy
( by arrays within the group ) is the cause of most of the problems.
Problem #1: Complexity
Consider a sequence group that has SSP='never' and the separator is a comma.
Its members ( A,B,C ) must always be represented as follows:
"a,b,c" or ",b,c" or ",,c"
but never "b,c" because that would imply that the separator for
an empty A had been suppressed.
Now suppose that B is an array with minOccurs=0 and maxOccurs=3 and occursCountKind='implicit'.
Acceptable representations are now:
"a,b1,b2,b3,c" or "a,b1,,,c" or even "a,,,c"
But if occursCountKind is changed to 'parsed' then the acceptable representations
suddenly alter, and empty occurrences of B can be completely omitted.
"a,b1,b2,b3,c" or "a,b1,c" or even "a,c"
[ or should that be "a,,c" ]
This seems wrong. The logic that implements suppression policy is hard
enough to implement already. Bringing in an extra layer of complexity around
arrays will make it so hard that most implementations would contain defects,
leading to interoperability issues.
Problem #2 Ambiguity
See the brackets in the preceding paragraph.
[ or should that be "a,,c" ]
It is far from obvious whether the group should insist on having a delimiter
for the array ( because its SSP is 'never' ) or whether the array should
take liberty to suppress the separators for all of its members ( as I assumed
when I wrote this email). The text of the specification is either silent
or unclear on this point.
Possible resolution:
Rather than attempting to specify implied behaviours for the various occursCountKind
settings, I believe the specification should
a) prohibit the use of certain occursCountKinds within positional sequences
b) require array occurrences to use the same SSP as other sequence members.
After some discussion with the IBM team, I believe a) will not generate
too many prohibited combinations, and the rationale for those prohibitions
will be consistent with already-existing schema definition errors.
b) will simplify the implementation of separation suppression, thus addressing
the complexity problem.
I expect we will need an action to be opened so that this can be discussed
in the working group meetings.
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU