I agree that "end" or whatever we decide to call it should be reserved for
the last object in a sequence. I prefer "endOfParent".
I have a general unease around the lengthKind enum "implicit". It
originally meant something quite specific, the length was derived from the
underlying xsd. That's now been extended for text decimals to mean derived
from the textNumberPattern pattern length. And for a sequence to mean
derived from the length of its children. I think we are overloading it. I
think that "implicit" should be reserved for simple elements only, with
its current semantic. And we should come up with a new enum, reserved for
complex elements or sequences only, suggest "children" (given I have also
suggested "endOfParent") or maybe "content".
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh(a)uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle" <mbeckerle.dfdl(a)gmail.com>
28/10/2008 00:48
Please respond to
<mbeckerle.dfdl(a)gmail.com>
To
Steve Hanson/UK/IBM@IBMGB
cc
Alan Powell/UK/IBM@IBMGB
Subject
RE: endOfData - was: RE: FW: MIke's notes from call on 2008-08-13
Not sure where this leaves us.
It is ok to reserve lengthKind="end" or "parent" or whatever for the last
element of a sequence.
...mike
Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451 |
mbeckerle.dfdl(a)gmail.com
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, October 22, 2008 12:56 PM
To: mbeckerle.dfdl(a)gmail.com
Cc: Alan Powell
Subject: Re: endOfData - was: RE: FW: MIke's notes from call on 2008-08-13
Mike - sorry but I think users will find this baffling.
lengthKind="implicit" was intended to mean that the logical xsd provided
the length. lengthKind="delimited" means that markup provides the length.
We are overloading the word "implicit" and we are wrong to do so. Trying
to wrap these together, and include "endOfData" (as "parent") as well, is
taking the abstraction too far. It is not how people view their data.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh(a)uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle" <mbeckerle.dfdl(a)gmail.com>
22/10/2008 15:29
Please respond to
<mbeckerle.dfdl(a)gmail.com>
To
Alan Powell/UK/IBM@IBMGB, Steve Hanson/UK/IBM@IBMGB
cc
Subject
endOfData - was: RE: FW: MIke's notes from call on 2008-08-13
First: apologies for missing the call today without notice. I've been
solid on a rather urgent customer-related matter since before our meeting
time and unable to break away.
Now: w.r.t. end of data email from Steve.
In the example you highlight, the reason both children of the sequence
have lengthKind="endOfData" is that the parent is providing the way of
determining the length, in this case using delimiters. Conceptually, the
parser can carve out the box of data bytes for the first child by scanning
for the separator, and the box for the 2nd child by scanning for the
terminator. Then it can present those finite size boxes to the parser to
parse each child, and each child consumes the entire box, i.e., to the end
of (its box of) data.
However, I agree the notion of "endOfData" is confusing as I have just
explained it above.
Perhaps the right lengthKind for a child to have when the enclosing
parent has a terminator or separator is
lengthKind="parent"
which you can read conceptually as:
"length kind for this child is determined by something specified in
the parent. So you'll find nothing here about length."
We could then drop the whole "endOfData" concept entirely.
So in the example, both children would still have lengthKind="parent".
The implied "parent" of the top level is the real true "end of the data",
so a top-level element could have lengthKind="parent" also. This is an
important composition property. It allows you to take a well specified
format and drop it in as the description of an MQ message payload, for
example.
Now, lengthKind="parent" is kind of the opposite of lengthKind="implicit".
"parent" is top down, i.e., from the enclosing structure. "implicit" is
bottom up, i.e., length implied by the contents of the element.
Here's a trick that can make this all more palatable. For certain kinds of
child elements, lengthKind="implicit" will behave as lengthKind="parent".
This would happen for variable length children without any way of
determining the variable length "bottom up". Examples of this are:
variable length text strings, variable occurrances of anything (with no
way to determine how many occurrances), or ordered sequences whose final
element is a variable length child without any way of determining the
variable length. (This definition is recursive intentionally.)
Given this, I think the DFDL fragment could be:
<complexType dfdl:lengthKind="implicit"
dfdl:representation="text" > // these are in the scope
....
<sequence dfdl:separator=?,? dfdl:terminator=?;?
dfdl:lengthKind="delimited">
<element name=?f1? type=?string? />
<element name=?f2? type=?string? />
</sequence>
....
</complexType>
Which I claim is what we want to have to write to capture the simple thing
this is trying to express, which is the format of "string1,string2;" after
all.
Comments?
BTW: notice my use of an enclosing complexType and ellipsis in order to
achieve the notion that certain property bindings surround the example.
This is one of the reasons I think we don't need a full up 2-level
semantic model as Sandy suggested. I think examples like the above are
sufficiently clear, particularly given the simplfied scoping.
Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451 |
mbeckerle.dfdl(a)gmail.com
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, October 22, 2008 9:09 AM
To: Mike Beckerle
Cc: Alan Powell
Subject: Re: FW: MIke's notes from call on 2008-08-13
Hi Mike
I owe a review of the "EndOfData Semantics" discussion below.
The only thing that looks slightly odd in the examples below is this:
<sequence dfdl:separator=?,? dfdl:terminator=?;?>
<element name=?f1? type=?string? dfdl:lengthKind=?endOfData?/>
<element name=?f2? type=?string? dfdl:lengthKind=?endOfData?/>
</sequence>
It doesn't seem right for f1 to have "endOfData". Should we have a rule
that says "endOfData" is only allowed on the last object in a sequence?
After all, that was its original - a way of the last thing saying it is
bounded by the end of its parent.
Would "endOfParent" be better than "endOfData" ?
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh(a)uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle" <mbeckerle.dfdl(a)gmail.com>
10/09/2008 14:10
Please respond to
<mbeckerle.dfdl(a)gmail.com>
To
Steve Hanson/UK/IBM@IBMGB
cc
Subject
FW: MIke's notes from call on 2008-08-13
Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451 |
mbeckerle.dfdl(a)gmail.com
From: Mike Beckerle [mailto:mbeckerle.dfdl@gmail.com]
Sent: Friday, August 15, 2008 11:53 AM
To: dfdl-wg(a)ogf.com
Subject: MIke's notes from call on 2008-08-13
Only Alan Powell and myself were on the call.
These are my notes.
TOPIC: Decimal Calendar ? idea: should behave as if decimal to text then
text to date/time. I.e., use same date/time pattern language, but a subset
of it since decimal can express nothing but digits.
TOPIC: Notes to authors (at start of spec) add that we don?t do scalar
type coersions/conversions generally. I.e., if the representation is a
floating point, then the logical must be a floating point. If the
representation is decimal, the logical must be decimal. We don?t allow you
to have a logical int whose rep is decimal or vice versa. Rationale: it
adds complexity that we an avoid. Doesn?t provide anything you can?t
easily do another way (layering), etc.
TOPIC: EndOfData Semantics:
We discussed that currently we were overloading the delimited concept to
include the end-of-data concept, and that was unsatisfactory and was
resulting in attempts to reinject end-of-data as ?end-of-bitstream? and
the like.
Points -
Distinguish delimited to mean we positively ARE scanning for a text
pattern delimiter, and not confusing this with the end-of-data case which
is fundamentally different.
Avoid special-case keyword only for the ?top level? end of the data
stream. This has really bad composition properties.
lengthKind=?endOfData? applies to both binary and text representations.
For text it means there is no terminator for this element. The enclosing
construct?s length, however determined (separator, terminator, fixed,
prefix, etc.) will bound length of this contained element.
Case: <sequence dfdl:separator=?,? dfdl:terminator=?;?>
<element name=?f1? type=?string?
dfdl:lengthKind=?endOfData?/>
<element name=?f2? type=?string?
dfdl:lengthKind=?endOfData?/>
</sequence>
The above seems ok to me.
Case: <sequence dfdl:lengthKind=?prefixed? dfdl:representation=?binary?>
<element name=?f1? type=?int? dfdl:length=?4?
dfdl:lengthKind=?explicit?>
<element name=?f2? type=?hexBinary?
dfdl:lengthKind=?endOfData?>
</sequence>
The above seems ok to me.
Important use cases:
Case 1: binary element at the end of a top-level sequence.
<schema ?>
<element name=?theTop?>
<complexType>
<sequence dfdl:lengthKind=?implicit?>
<element name=?f1? type=?int? dfdl:length=?4?
dfdl:lengthKind=?explicit?/>
<element name=?f2? type=?hexBinary?
dfdl:lengthKind=?endOfData?/>
</sequence>
</complexType>
</element>
</schema>
In the above, the top level sequence has implicit length kind. This is ok,
because the top level is assumed to be in an ?end of data? context.
Case 2: deeper nesting, same implicit-length sequence.
<schema ?>
<element name=?NestedInside?>
<complexType>
<sequence dfdl:lengthKind=?implicit?>
<element name=?f1? type=?int? dfdl:length=?4?
dfdl:lengthKind=?explicit?/>
<element name=?f2? type=?hexBinary?
dfdl:lengthKind=?endOfData?/>
</sequence>
</complexType>
</element>
<element name=?stillNotTheTop?>
<complexType>
<sequence dfdl:lengthKind=?implicit?>
?
<element ref=?NestedInside?/>
</sequence>
</complexType>
</element>
<element name=?hasFixedLength?>
<complexType>
<sequence dfdl:lengthKind=?explicit? dfdl:length=?100?>
?
<element ref=?stillNotTheTop?/>
</sequence>
</complexType>
</element>
?.
</schema>
This case illustrates how the composition properties work for
explicit/implicit lengths.
The definition of how this works goes something like this.
When the last element of a sequence is binary with lengthKind=?endOfData?
this implies that the enclosing sequence is:
(a) length kind explicit or prefixed or endOfdata
(b) length kind implicit ? in this case recursively this enclosing
sequence must itself be enclosed in a sequence similarly constrained on
length kind (cases a, b, c here)
(c) the top-level sequence
Note: We need to revisit whether the name ?endOfData? is desirable or not.
There?s a list of alternatives from the F2F meeting. Problem is that a
naïve user will be thinking ?top level? but the concept actually needs to
be compositional/nestable.
TOPIC: float/double ? we concluded that until XML has floating point types
that can handle extended precisions that DFDL can?t handle extended
precisions in any reasonable way, so we should simply say DFDL v1.0
supports only 64-bit floating point precision and 32 bit floating point
precision. This narrows down float types to IEEE (single and double), and
IBM390 (single and double), and maybe AS400 if that?s different and still
within 64 bits precision.
Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451 |
mbeckerle.dfdl(a)gmail.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU