Hi Mike
More replies but this time I'll keep
them together here as the Word doc would get hard to read....
Tim and I have been thinking on similar
lines as your "have
enough properties to determine that the length is zero".
In addition to your examples there are also:
- lengthKind="prefixed" and
prefix length is 0
- lengthKind="explicit" and
lengthCount expression evaluates to 0
Using the same sectioning as the document...
-------------------------------------------------
a) Fixed length, no delimiters
We agree that there should be no defaulting
when the length is > 0.
Need to decide whether the length =
0 case implies defaulting, we think it does as the property determines
that the length is zero
b) Fixed length, only parent has
delimiters
This boils down to whether we need to
detect early termination. Spec and yourself are clear that scanning is
off when parsing fixed length. I'd like to hear what Steph has to say on
this.
c) Fixed length, initiators
You want to treat this the same as un-initiated
fixed length. OK, but more on this later under i)
e) Delimited, separators required
We agree that defaults should be applied
when adjacent separators encountered
f) Delimited, separators suppressed
at end
We agree that defaults should be applied
when adjacent separators encountered and at the end
g) Delimited, initiators, separators
required
We agree that defaults should be applied
when adjacent separators encountered
i) Delimited, initiators, separators
suppressed
You want defaults to be applied when
an element is entirely absent (B in the example)
Tim and I struggle to differentiate
this case from c). At the start of B processing, there is nothing
in the data to indicate B and the next thing is C's initiator. So why is
the defaulting rule different?
Take this one step further - my data
is fixed length, initiated and the parent has a suppressed separator -
so which of c) and i) applies?
How does the parser know when a group
has ended?
One of Tim's rules was when an enclosing
delimiter is found. That is not always the case. Tim suggested that
if the immediate parent had lengthKind="implicit" then we would
not be looking for delimiters. I believe your YES was agreeing with that?
We would say it is also true if the immediate parent had lengthKind
= "explicit" or "pattern" too.
What is the algorithm for selecting
the next occurrence?
Tim and I discussed this, and there
is not an issue here. The occursCountKind always tells you the number to
expect (which might be 'don't know' if occursCountKind = "parsed"
in which case we just speculatively parse).
When parsing a group with separatorPolicy=suppressed,
is every group member a 'point of uncertainty'?
Agree with your statement.
----------------------------------
Other things to discuss:
Defaulting complex elements when
parsing
The spec says that if zero length content
is obtained for a complex element then it is defaulted, which means the
element's complex type is walked and default values are sent to the infoset
for required elements. It is an error if any required elements do not have
a default value. A simpler alternative is to create just the element in
the infoset with no children, but this would fail validation if switched
on.
Separator position
Any rules that we agree on must take
into account infix v prefix v postfix. In practice this determines how
an element is 'bound' to a separator. Prefix it is bound to the beginning,
postfix it is bound to the end, infix it is bound to the beginning except
for the first element (need to check with Steph is that is how WTX does
it).
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
| "Mike Beckerle" <mbeckerle.dfdl@gmail.com>
|
To:
| Tim Kimber/UK/IBM@IBMGB
|
Cc:
| Steve Hanson/UK/IBM@IBMGB
|
Date:
| 11/06/2011 01:56
|
Subject:
| RE: A selection of example data formats |
My comments on your examples.
I had to turn it into a word doc to reasonably put my commentary inline
into this.
I think the concept of an
element declaration being classified into:
·
Can be defaulted from
nothing
·
Can be defaulted from
empty content (but requires some framing to determine that the content
is empty)
·
Cannot be defaulted (requires
at least some content bits, possibly also some framing)
… I think this is something
we’re in need of in the spec.
If the element can be defaulted
from nothing, and it is required, and we have nothing, i.e., no bits meaning
that we have enough properties to determine that the length is zero, then
we default it to get the infoset value. If it’s optional, then we don’t
default it, and nothing goes into the infoset.
This begs the question of
“have enough properties to determine that the length is zero”.
E.g., of this: end of data,
end of parent, this element has no delimiters, but lengthKind=delimited
and a parent delimiter was immediately encountered which terminates the
element after zero bits. lengthKind=”pattern”, lengthPattern=”a*”,
and the data has no “a” characters, so the length comes out zero, and
no bits are consumed. Recursively, length is zero for a group requires
same properties to hold inductively and for the group itself.
I’m not sure I’ve got all
the cases here, but it’s something like this.
That’s all for my brain
on DFDL today…..
From: Tim Kimber [mailto:KIMBERT@uk.ibm.com]
Sent: Thursday, June 02, 2011 4:41 PM
To: mbeckerle.dfdl@gmail.com
Cc: Steve Hanson
Subject: A selection of example data formats
Mike,
Steve asked me to forward this text file that I have put together. I put
it together as background material for our discussions about the parsing
of DFDL elements and groups.
Key issues:
- The specification uses the terms 'empty', 'missing' and 'known not to
exist' in reference to elements. We need to work out what these terms mean
so that the spec can be made clearer.
- In my opinion, the terms 'missing' and 'known not to exist' should not
have different meanings - it invites criticism. If 'missing' means something
different from 'known not to exist' then we need a different word or phrase.
- The application of default values for missing required elements in
the parser is problematic. I think Steve may have sent you an email
about this, so I won't outline the issues here ( Steve, please can you
forward your email to me ).
Disclaimer : This set of data formats does not highlight all of the unresolved
questions around the parsing of groups - only the ones that were in play
at the time I produced the document.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU