comments in <tk>tags
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Steve Hanson/UK/IBM
To:
Tim Kimber/UK/IBM@IBMGB,
Cc:
dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org
Date:
11/06/2014 10:47
Subject:
Re: [DFDL-WG]
Action 261
Some thoughts on this...
I agree that the definition of positional
sequence in the spec needs tightening as it is ambiguous as it stands and
could be interpreted as a) or b). If we adopted b) then that would
appear to allow 'expression' to appear in a positional sequence, but wouldn't
it also allow 'stopValue'?
<tk>Yes - according to definition
b) stopValue would be allowable in a positional sequence. We could still
disallow it if we do not believe there is any benefit in allowing it. I
don't believe it introduces any particular complexities for an implementer.</tk>
occursCountKind 'expression' is analogous
to lengthKind 'explicit' with an expression and to lengthKind 'prefixed'.
Both these lengthKinds are classified as 'specified length' when parsing
but 'variable length' when unparsing. We are observing that occursCountKind
'expression' is like 'fixed' when parsing but not quite so like 'fixed'
when unparsing - which is why section 16 groups 'expression' with 'parsed'
for unparsing.
<tk>Yes - we took a decision that
the unparser should ignore the expression in lengthKind/occursCountKind,
and just output whatever data happens to be in the info set.
I'm not sure that it saves a lot of
effort in the implementation and it certainly is not easy to justify as
a consistent behaviour. For me, the unparser should treat lengthKind='explicit'
the same way whether the value is static or calculated. And the unparser
should treat lengthKind='expression' the same way as lengthKind='fixed'.
</tk>
When unparsing occursCountKind 'expression'
you don't always have the calculated array length N. If the infoset was
derived from XML, there is likely no 'count' element, just a bunch of elements
with the same name that make up the 'array'. DFDL gives you the choice
whether to manually set the count element, or to have the parser set it
automatically via outputValueCalc. In the former case, you can create a
document that can not be parsed;
<tk>You can with the current rules
too. In fact, you can parse a document with trailing optional empty array
occurrences and when it is unparsed the trailing empty occurrences will
have been discarded.</tk>
the unparser could check the 'count'
element matches the infoset, but that would involve reverse engineering
an arbitrarily complex expression and is why the specification does not
say that.
<tk>It would involve evaluating
the expression. In most cases, that will not require any lookahead because
the Length/Count field will precede the array or element. Not sure where
the reverse engineering comes in?</tk>
Here's a real example of such an expression
(albeit with lengthKind 'explicit' but the principle is the same):
dfdl:length="{xs:nonNegativeInteger(fn:floor((../Length
+ 1) div 2))}"
Alex brought up the case where the expression
evaluates to 0. In a positional sequence, would you still expect a delimiter
for this case?
<tk>Yes, unless it is in the trailing
optional region of the group and SSP='trailingEmpty'. In a positional sequence,
every delimiter must be present until suppression begins ( if allowed )</tk>
If 'yes' then the resultant zero length
string must be treated as the 'absent representation' and ignored. If 'no'
then is the sequence still positional?
<tk>I don't understand the point.
Why would it not be the 'empty representation'? Why must it be 'ignored'
if it does happen to be the 'absent representation'? What does 'ignored'
mean?</tk>
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Tim Kimber/UK/IBM@IBMGB
To:
dfdl-wg@ogf.org,
Date:
10/06/2014 21:22
Subject:
[DFDL-WG] Action
261
Sent by:
dfdl-wg-bounces@ogf.org
Implied separatorSuppressionPolicy
for occursCountKind 'expression ' (All)
10/6: Spec says it is 'never' (positional sequence) but you have to parse
to identify the position, so isn't that non-positional?
I think there are two alternative definitions of 'positional':
a) the identity of every delimited field is known before parsing of the
sequence group begins
b) the identity of every delimited field is known before parsing of the
field begins
As an implementer, b) is sufficient because it means that the parser never
needs to backtrack while parsing the group.
a) allows the field identities to be statically known, but that is less
important - it does not allow optimised extraction of a particular field
as would be the case for a fixed-length group ( the possibility of escaped
separators/terminators means that every character will need to be scanned
anyway ).
It may sound like a small point, but it affects two decisions
1. whether ock='expression' should be allowed within a positional sequence
group ( action 261 )
2. what the behaviour of the unparser should be w.r.t. ock='expression'.
My own feeling is that ock='expression' should be treated almost exactly
like ock='fixed', except that the calculated array length N is used instead
of maxOccurs.
- When parsing a positional sequence group it should cause N delimiters
to be expected for the array.
- When unparsing a positional sequence group it should cause N delimiters
to be written.
These rules are consistent and straightforward to describe and implement.
The current rule ( unparser outputs the occurrences that are in the info
set only ) allows the unparser to write a document that cannot be parsed
using the same schema.
regards,
Tim Kimber,
----- Forwarded by Tim Kimber/UK/IBM on 10/06/2014 20:34 -----
From: Steve
Hanson/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 10/06/2014
17:57
Subject: [DFDL-WG]
OGF DFDL WG Call Minutes 2014-06-10
Sent by: dfdl-wg-bounces@ogf.org
Please find minutes from the above call at http://redmine.ogf.org/dmsf_files/13263?download=
Regards
Steve Hanson
Architect, IBM DFDL,
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848 --
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU