I agree that "end" or whatever
we decide to call it should be reserved for the last object in a sequence.
I prefer "endOfParent".
I have a general unease around the lengthKind
enum "implicit". It originally meant something quite specific,
the length was derived from the underlying xsd. That's now been extended
for text decimals to mean derived from the textNumberPattern pattern length.
And for a sequence to mean derived from the length of its children. I think
we are overloading it. I think that "implicit" should be reserved
for simple elements only, with its current semantic. And we should come
up with a new enum, reserved for complex elements or sequences only, suggest
"children" (given I have also suggested "endOfParent")
or maybe "content".
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
28/10/2008 00:48
Please respond to
<mbeckerle.dfdl@gmail.com> |
|
To
| Steve Hanson/UK/IBM@IBMGB
|
cc
| Alan Powell/UK/IBM@IBMGB
|
Subject
| RE: endOfData - was: RE: FW: MIke's
notes from call on 2008-08-13 |
|
Not sure where this leaves us.
It is ok to reserve lengthKind="end"
or "parent" or whatever for the last element of a sequence.
...mike
Mike Beckerle | OGF DFDL
WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451
| mbeckerle.dfdl@gmail.com
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, October 22, 2008 12:56 PM
To: mbeckerle.dfdl@gmail.com
Cc: Alan Powell
Subject: Re: endOfData - was: RE: FW: MIke's notes from call on 2008-08-13
Mike - sorry but I think users will find this baffling. lengthKind="implicit"
was intended to mean that the logical xsd provided the length. lengthKind="delimited"
means that markup provides the length. We are overloading the word "implicit"
and we are wrong to do so. Trying to wrap these together, and include "endOfData"
(as "parent") as well, is taking the abstraction too far. It
is not how people view their data.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
22/10/2008 15:29
Please respond to
<mbeckerle.dfdl@gmail.com> |
|
To
| Alan Powell/UK/IBM@IBMGB,
Steve Hanson/UK/IBM@IBMGB
|
cc
|
|
Subject
| endOfData - was: RE: FW: MIke's notes
from call on 2008-08-13 |
|
First: apologies for missing the call today without notice. I've been solid
on a rather urgent customer-related matter since before our meeting time
and unable to break away.
Now: w.r.t. end of data email from Steve.
In the example you highlight, the reason both children of the sequence
have lengthKind="endOfData" is that the parent is providing the
way of determining the length, in this case using delimiters. Conceptually,
the parser can carve out the box of data bytes for the first child by scanning
for the separator, and the box for the 2nd child by scanning for the terminator.
Then it can present those finite size boxes to the parser to parse each
child, and each child consumes the entire box, i.e., to the end of (its
box of) data.
However, I agree the notion of "endOfData" is confusing as I
have just explained it above.
Perhaps the right lengthKind for a child to have when the enclosing
parent has a terminator or separator is
lengthKind="parent"
which you can read conceptually as:
"length kind for this child is determined by something
specified in the parent. So you'll find nothing here about length."
We could then drop the whole "endOfData" concept entirely.
So in the example, both children would still have lengthKind="parent".
The implied "parent" of the top level is the real true "end
of the data", so a top-level element could have lengthKind="parent"
also. This is an important composition property. It allows you to take
a well specified format and drop it in as the description of an MQ message
payload, for example.
Now, lengthKind="parent" is kind of the opposite of lengthKind="implicit".
"parent" is top down, i.e., from the enclosing structure. "implicit"
is bottom up, i.e., length implied by the contents of the element.
Here's a trick that can make this all more palatable. For certain kinds
of child elements, lengthKind="implicit" will behave as lengthKind="parent".
This would happen for variable length children without any way of determining
the variable length "bottom up". Examples of this are: variable
length text strings, variable occurrances of anything (with no way to determine
how many occurrances), or ordered sequences whose final element is a variable
length child without any way of determining the variable length. (This
definition is recursive intentionally.)
Given this, I think the DFDL fragment could be:
<complexType dfdl:lengthKind="implicit"
dfdl:representation="text"
> // these are in the scope
....
<sequence dfdl:separator=”,” dfdl:terminator=”;”
dfdl:lengthKind="delimited">
<element name=”f1” type=”string” />
<element name=”f2” type=”string” />
</sequence>
....
</complexType>
Which I claim is what we want to have to write to capture the simple thing
this is trying to express, which is the format of "string1,string2;"
after all.
Comments?
BTW: notice my use of an enclosing complexType and ellipsis in order to
achieve the notion that certain property bindings surround the example.
This is one of the reasons I think we don't need a full up 2-level semantic
model as Sandy suggested. I think examples like the above are sufficiently
clear, particularly given the simplfied scoping.
Mike Beckerle | OGF DFDL
WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451
| mbeckerle.dfdl@gmail.com
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, October 22, 2008 9:09 AM
To: Mike Beckerle
Cc: Alan Powell
Subject: Re: FW: MIke's notes from call on 2008-08-13
Hi Mike
I owe a review of the "EndOfData
Semantics" discussion below.
The only thing that looks slightly odd in the examples below is this:
<sequence dfdl:separator=”,” dfdl:terminator=”;”>
<element name=”f1” type=”string” dfdl:lengthKind=”endOfData”/>
<element name=”f2” type=”string” dfdl:lengthKind=”endOfData”/>
</sequence>
It doesn't seem right for f1 to have "endOfData". Should we have
a rule that says "endOfData" is only allowed on the last object
in a sequence? After all, that was its original - a way of the last thing
saying it is bounded by the end of its parent.
Would "endOfParent" be better than "endOfData" ?
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
10/09/2008 14:10
Please respond to
<mbeckerle.dfdl@gmail.com> |
|
To
| Steve Hanson/UK/IBM@IBMGB
|
cc
|
|
Subject
| FW: MIke's notes from call on 2008-08-13 |
|
Mike Beckerle | OGF DFDL
WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451
| mbeckerle.dfdl@gmail.com
From: Mike Beckerle [mailto:mbeckerle.dfdl@gmail.com]
Sent: Friday, August 15, 2008 11:53 AM
To: dfdl-wg@ogf.com
Subject: MIke's notes from call on 2008-08-13
Only Alan Powell and myself were on the call.
These are my notes.
TOPIC: Decimal Calendar – idea: should behave as if decimal to text
then text to date/time. I.e., use same date/time pattern language, but
a subset of it since decimal can express nothing but digits.
TOPIC: Notes to authors (at start of spec) add that we don’t do scalar
type coersions/conversions generally. I.e., if the representation is a
floating point, then the logical must be a floating point. If the representation
is decimal, the logical must be decimal. We don’t allow you to have a
logical int whose rep is decimal or vice versa. Rationale: it adds complexity
that we an avoid. Doesn’t provide anything you can’t easily do another
way (layering), etc.
TOPIC: EndOfData Semantics:
We discussed that currently we were overloading the delimited concept to
include the end-of-data concept, and that was unsatisfactory and was resulting
in attempts to reinject end-of-data as “end-of-bitstream” and the like.
Points -
- Distinguish delimited to mean
we positively ARE scanning for a text pattern delimiter, and not confusing
this with the end-of-data case which is fundamentally different.
- Avoid special-case keyword only
for the “top level” end of the data stream. This has really bad composition
properties.
- lengthKind=’endOfData’ applies
to both binary and text representations. For text it means there is no
terminator for this element. The enclosing construct’s length, however
determined (separator, terminator, fixed, prefix, etc.) will bound length
of this contained element.
- Case: <sequence dfdl:separator=”,”
dfdl:terminator=”;”>
-
<element name=”f1” type=”string” dfdl:lengthKind=”endOfData”/>
-
<element name=”f2” type=”string” dfdl:lengthKind=”endOfData”/>
-
</sequence>
- The above seems ok to me.
- Case: <sequence dfdl:lengthKind=”prefixed”
dfdl:representation=”binary”>
-
<element name=”f1” type=”int” dfdl:length=”4”
dfdl:lengthKind=”explicit”>
-
<element name=”f2” type=”hexBinary”
dfdl:lengthKind=”endOfData”>
-
</sequence>
- The above seems ok to me.
Important
use cases:
Case 1: binary element at the end of a top-level sequence.
<schema …>
<element name=”theTop”>
<complexType>
<sequence dfdl:lengthKind=”implicit”>
<element name=”f1” type=”int” dfdl:length=”4” dfdl:lengthKind=”explicit”/>
<element
name=”f2” type=”hexBinary” dfdl:lengthKind=”endOfData”/>
</sequence>
</complexType>
</element>
</schema>
In the above, the top level sequence has implicit length kind. This is
ok, because the top level is assumed to be in an “end of data” context.
Case 2: deeper nesting, same implicit-length sequence.
<schema …>
<element name=”NestedInside”>
<complexType>
<sequence dfdl:lengthKind=”implicit”>
<element name=”f1” type=”int” dfdl:length=”4” dfdl:lengthKind=”explicit”/>
<element
name=”f2” type=”hexBinary” dfdl:lengthKind=”endOfData”/>
</sequence>
</complexType>
</element>
<element name=”stillNotTheTop”>
<complexType>
<sequence dfdl:lengthKind=”implicit”>
…
<element
ref=”NestedInside”/>
</sequence>
</complexType>
</element>
<element name=”hasFixedLength”>
<complexType>
<sequence dfdl:lengthKind=”explicit”
dfdl:length=”100”>
…
<element
ref=”stillNotTheTop”/>
</sequence>
</complexType>
</element>
….
</schema>
This case illustrates how the composition properties work for explicit/implicit
lengths.
The definition of how this works goes something like this.
When the last element of a sequence is binary with lengthKind=”endOfData”
this implies that the enclosing sequence is:
(a) length
kind explicit or prefixed or endOfdata
(b) length
kind implicit – in this case recursively this enclosing sequence must
itself be enclosed in a sequence similarly constrained on length kind (cases
a, b, c here)
(c) the
top-level sequence
Note: We need to revisit whether the name ‘endOfData’ is desirable or
not. There’s a list of alternatives from the F2F meeting. Problem is that
a naïve user will be thinking “top level” but the concept actually needs
to be compositional/nestable.
TOPIC: float/double – we concluded that until XML has floating point
types that can handle extended precisions that DFDL can’t handle extended
precisions in any reasonable way, so we should simply say DFDL v1.0 supports
only 64-bit floating point precision and 32 bit floating point precision.
This narrows down float types to IEEE (single and double), and IBM390 (single
and double), and maybe AS400 if that’s different and still within 64 bits
precision.
Mike Beckerle | OGF DFDL
WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451
| mbeckerle.dfdl@gmail.com
Unless stated otherwise
above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU