Hi Mike

More replies but this time I'll keep them together here as the Word doc would get hard to read....

Tim and I have been thinking on similar lines as your "have enough properties to determine that the length is zero". In addition to your examples there are also:
- lengthKind="prefixed" and prefix length is 0
- lengthKind="explicit" and lengthCount expression evaluates to 0

Using the same sectioning as the document...

-------------------------------------------------
a) Fixed length, no delimiters
We agree that there should be no defaulting when the length is > 0.
Need to decide whether the length = 0 case implies defaulting, we think it does as the property determines that the length is zero

b) Fixed length, only parent has delimiters
This boils down to whether we need to detect early termination. Spec and yourself are clear that scanning is off when parsing fixed length. I'd like to hear what Steph has to say on this.

c) Fixed length, initiators
You want to treat this the same as un-initiated fixed length. OK, but more on this later under i)

e) Delimited, separators required
We agree that defaults should be applied when adjacent separators encountered

f) Delimited, separators suppressed at end
We agree that defaults should be applied when adjacent separators encountered and at the end

g) Delimited, initiators, separators required
We agree that defaults should be applied when adjacent separators encountered

i) Delimited, initiators, separators suppressed
You want defaults to be applied when an element is entirely absent (B in the example)
Tim and I struggle to differentiate this case from c). At the start of B processing, there is nothing in the data to indicate B and the next thing is C's initiator. So why is the defaulting rule different?
Take this one step further - my data is fixed length, initiated and the parent has a suppressed separator - so which of c) and i) applies?

How does the parser know when a group has ended?
One of Tim's rules was when an enclosing delimiter is found. That is not always the case. Tim suggested that if the immediate parent had lengthKind="implicit" then we would not be looking for delimiters. I believe your YES was agreeing with that? We would say it is also true if the immediate parent had lengthKind = "explicit" or "pattern" too.

What is the algorithm for selecting the next occurrence?
Tim and I discussed this, and there is not an issue here. The occursCountKind always tells you the number to expect (which might be 'don't know' if occursCountKind = "parsed" in which case we just speculatively parse).

When parsing a group with separatorPolicy=suppressed, is every group member a 'point of uncertainty'?
Agree with your statement.
----------------------------------

Other things to discuss:

Defaulting complex elements when parsing
The spec says that if zero length content is obtained for a complex element then it is defaulted, which means the element's complex type is walked and default values are sent to the infoset for required elements. It is an error if any required elements do not have a default value. A simpler alternative is to create just the element in the infoset with no children, but this would fail validation if switched on.

Separator position
Any rules that we agree on must take into account infix v prefix v postfix. In practice this determines how an element is 'bound' to a separator. Prefix it is bound to the beginning, postfix it is bound to the end, infix it is bound to the beginning except for the first element (need to check with Steph is that is how WTX does it).

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From:	"Mike Beckerle" <mbeckerle.dfdl@gmail.com>
To:	Tim Kimber/UK/IBM@IBMGB
Cc:	Steve Hanson/UK/IBM@IBMGB
Date:	11/06/2011 01:56
Subject:	RE: A selection of example data formats

My comments on your examples. I had to turn it into a word doc to reasonably put my commentary inline into this.

I think the concept of an element declaration being classified into:

· Can be defaulted from nothing
· Can be defaulted from empty content (but requires some framing to determine that the content is empty)
· Cannot be defaulted (requires at least some content bits, possibly also some framing)

… I think this is something we’re in need of in the spec.

If the element can be defaulted from nothing, and it is required, and we have nothing, i.e., no bits meaning that we have enough properties to determine that the length is zero, then we default it to get the infoset value. If it’s optional, then we don’t default it, and nothing goes into the infoset.

This begs the question of “have enough properties to determine that the length is zero”.

E.g., of this: end of data, end of parent, this element has no delimiters, but lengthKind=delimited and a parent delimiter was immediately encountered which terminates the element after zero bits. lengthKind=”pattern”, lengthPattern=”a*”, and the data has no “a” characters, so the length comes out zero, and no bits are consumed. Recursively, length is zero for a group requires same properties to hold inductively and for the group itself.

I’m not sure I’ve got all the cases here, but it’s something like this.

That’s all for my brain on DFDL today…..

From: Tim Kimber [mailto:KIMBERT@uk.ibm.com]
Sent: Thursday, June 02, 2011 4:41 PM
To: mbeckerle.dfdl@gmail.com
Cc: Steve Hanson
Subject: A selection of example data formats

Mike,

Steve asked me to forward this text file that I have put together. I put it together as background material for our discussions about the parsing of DFDL elements and groups.

Key issues:
- The specification uses the terms 'empty', 'missing' and 'known not to exist' in reference to elements. We need to work out what these terms mean so that the spec can be made clearer.
- In my opinion, the terms 'missing' and 'known not to exist' should not have different meanings - it invites criticism. If 'missing' means something different from 'known not to exist' then we need a different word or phrase.
- The application of default values for missing required elements in the parser is problematic. I think Steve may have sent you an email about this, so I won't outline the issues here ( Steve, please can you forward your email to me ).

Disclaimer : This set of data formats does not highlight all of the unresolved questions around the parsing of groups - only the ones that were in play at the time I produced the document.

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU