I agree that "end" or whatever we decide to call it should be reserved for the last object in a sequence. I prefer "endOfParent".

I have a general unease around the lengthKind enum "implicit".  It originally meant something quite specific, the length was derived from the underlying xsd. That's now been extended for text decimals to mean derived from the textNumberPattern pattern length. And for a sequence to mean derived from the length of its children. I think we are overloading it. I think that "implicit" should be reserved for simple elements only, with its current semantic. And we should come up with a new enum, reserved for complex elements or sequences only, suggest "children" (given I have also suggested "endOfParent") or maybe "content".

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848



"Mike Beckerle" <mbeckerle.dfdl@gmail.com>

28/10/2008 00:48
Please respond to
<mbeckerle.dfdl@gmail.com>

To
Steve Hanson/UK/IBM@IBMGB
cc
Alan Powell/UK/IBM@IBMGB
Subject
RE: endOfData - was: RE: FW: MIke's notes from call on 2008-08-13





 
Not sure where this leaves us.
 
It is ok to reserve lengthKind="end" or "parent" or whatever for the last element of a sequence.
 
...mike
 
 

Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel:  781-810-2100  | 504 Totten Pond Road, Waltham MA 02451 |
mbeckerle.dfdl@gmail.com

 


From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent:
Wednesday, October 22, 2008 12:56 PM
To:
mbeckerle.dfdl@gmail.com
Cc:
Alan Powell
Subject:
Re: endOfData - was: RE: FW: MIke's notes from call on 2008-08-13



Mike - sorry but I think users will find this baffling. lengthKind="implicit" was intended to mean that the logical xsd provided the length. lengthKind="delimited" means that markup provides the length. We are overloading the word "implicit" and we are wrong to do so. Trying to wrap these together, and include "endOfData" (as "parent") as well, is taking the abstraction too far. It is not how people view their data.


Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848


"Mike Beckerle" <mbeckerle.dfdl@gmail.com>

22/10/2008 15:29
Please respond to
<mbeckerle.dfdl@gmail.com>


To
Alan Powell/UK/IBM@IBMGB, Steve Hanson/UK/IBM@IBMGB
cc
Subject
endOfData - was: RE: FW: MIke's notes from call on 2008-08-13







First: apologies for missing the call today without notice. I've been solid on a rather urgent customer-related matter since before our meeting time and unable to break away.

 

Now: w.r.t. end of data email from Steve.

 

In the example you highlight, the reason both children of the sequence have lengthKind="endOfData" is that the parent is providing the way of determining the length, in this case using delimiters. Conceptually, the parser can carve out the box of data bytes for the first child by scanning for the separator, and the box for the 2nd child by scanning for the terminator. Then it can present those finite size boxes to the parser to parse each child, and each child consumes the entire box, i.e., to the end of (its box of) data.

 

However, I agree the notion of "endOfData" is confusing as I have just explained it above.

 

Perhaps the right  lengthKind for a child to have when the enclosing parent has a terminator or separator is
 
    lengthKind="parent"

 
which you can read conceptually as:

 
    "length kind for this child is determined by something specified in the parent. So you'll find nothing here about length."

 
We could then drop the whole "endOfData" concept entirely.

 
So in the example, both children would still have lengthKind="parent".
 
The implied "parent" of the top level is the real true "end of the data", so a top-level element could have lengthKind="parent" also. This is an important composition property. It allows you to take a well specified format and drop it in as the description of an MQ message payload, for example.

 
Now, lengthKind="parent" is kind of the opposite of lengthKind="implicit". "parent" is top down, i.e., from the enclosing structure. "implicit" is bottom up, i.e., length implied by the contents of the element.
 
Here's a trick that can make this all more palatable. For certain kinds of child elements, lengthKind="implicit" will behave as lengthKind="parent". This would happen for variable length children without any way of determining the variable length "bottom up". Examples of this are: variable length text strings, variable occurrances of anything (with no way to determine how many occurrances), or ordered sequences whose final element is a variable length child without any way of determining the variable length. (This definition is recursive intentionally.)

 
Given this, I think the DFDL fragment could be:

 
<complexType dfdl:lengthKind="implicit"

            dfdl:representation="text" > // these are in the scope

....

<sequence dfdl:separator=”,” dfdl:terminator=”;”
         dfdl:lengthKind="delimited">
  <element name=”f1” type=”string” />
  <element name=”f2” type=”string” />
</sequence>
....

</complexType>

 
 
Which I claim is what we want to have to write to capture the simple thing this is trying to express, which is the format of "string1,string2;" after all.

 
Comments?

 
BTW: notice my use of an enclosing complexType and ellipsis in order to achieve the notion that certain property bindings surround the example. This is one of the reasons I think we don't need a full up 2-level semantic model as Sandy suggested. I think examples like the above are sufficiently clear, particularly given the simplfied scoping.

Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel:  781-810-2100  | 504 Totten Pond Road, Waltham MA 02451 |
mbeckerle.dfdl@gmail.com


 


From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent:
Wednesday, October 22, 2008 9:09 AM
To:
Mike Beckerle
Cc:
Alan Powell
Subject:
Re: FW: MIke's notes from call on 2008-08-13


Hi Mike


I owe a review of the
"EndOfData Semantics" discussion below.

The only thing that looks slightly odd in the examples below is this:


<sequence dfdl:separator=”,” dfdl:terminator=”;”>

  <element name=”f1” type=”string” dfdl:lengthKind=”endOfData”/>

  <element name=”f2” type=”string” dfdl:lengthKind=”endOfData”/>

</sequence>


It doesn't seem right for f1 to have "endOfData". Should we have a rule that says "endOfData" is only allowed on the last object in a sequence? After all, that was its original - a way of the last thing saying it is bounded by the end of its parent.

Would "endOfParent" be better than "endOfData" ?


Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

"Mike Beckerle" <mbeckerle.dfdl@gmail.com>

10/09/2008 14:10
Please respond to
<mbeckerle.dfdl@gmail.com>


To
Steve Hanson/UK/IBM@IBMGB
cc
Subject
FW: MIke's notes from call on 2008-08-13










 

Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel:  781-810-2100  | 504 Totten Pond Road, Waltham MA 02451 |
mbeckerle.dfdl@gmail.com






From:
Mike Beckerle [mailto:mbeckerle.dfdl@gmail.com]
Sent:
Friday, August 15, 2008 11:53 AM
To:
dfdl-wg@ogf.com
Subject:
MIke's notes from call on 2008-08-13



Only Alan Powell and myself were on the call.

These are my notes.

TOPIC
: Decimal Calendar – idea: should behave as if decimal to text then text to date/time. I.e., use same date/time pattern language, but a subset of it since decimal can express nothing but digits.

TOPIC
: Notes to authors (at start of spec) add that we don’t do scalar type coersions/conversions generally. I.e., if the representation is a floating point, then the logical must be a floating point. If the representation is decimal, the logical must be decimal. We don’t allow you to have a logical int whose rep is decimal or vice versa. Rationale: it adds complexity that we an avoid. Doesn’t provide anything you can’t easily do another way (layering), etc.

TOPIC
: EndOfData Semantics:

We discussed that currently we were overloading the delimited concept to include the end-of-data concept, and that was unsatisfactory and was resulting in attempts to reinject end-of-data as “end-of-bitstream” and the like.

Points -