The only reason I haven't released the
document "officially" to the working group is that it is very
incomplete and half baked. There are no IP concerns.
I'm concerned that the approach doesn't
even hold up. In particular there is a "forward induction" from
earlier fields to later fields implied in the approach. I'm not sure this
works except for "stream-capable" formats. Some formats can depend
on random access capabilities, or definition working back from the end
of the data.
Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA
Martin Westhead <martinwesthead@yahoo.co.uk> Sent by: owner-dfdl-wg@ggf.org
09/05/2005 09:12 AM
To
"Robert E. McGrath"
<mcgrath@ncsa.uiuc.edu>
cc
dfdl-wg@gridforum.org
Subject
Re: [dfdl-wg] Plumbing document
Hi Robert,
See inline:
Robert E. McGrath wrote:
> Greetings,
>
> I took a quick glance at the streams and semantics notes sent yesterday.
>
> There is obviously common ground here, and both meshed with my own
> half-baked thinking.
>
> They did make me disagree with one part of the approach, which may
> make things a little simpler.
>
> So here's my thinking.
>
> IMO, there is no reason to worry about a detailed definition of streams.
>
> It seems to me that we are simply dealing with sequences of bits,
which
> can be streams or not.
>
> So I see the universe of DFDL as:
>
> sequence of bits ==> computer science data type ==>
seq. of bits
>
> where CS data type is "byte", "int32", etc. (Z,
G, et al. in the semantics
> note)
There are two issues/points I have with your statement above:
1. We have chosen (at this point) the XML/XML Schema data model
as our
data model. Another way of thinking about what we are doing here as
follows: XML Schema provides a way of describing the syntax and type
level semantics of XML documents. DFDL extends that capability so that
XML Schema can describe other (want to say "all") text and binary
formats.
2. DFDL is describing:
bits ==> XML type ==> ... ==> XML type ==> ... ==>
XML type ==> bits
i.e. there are arbitrary layers of description that we would like that
need to be separable (modular). e.g.
bits ==> strings ==> ints ==> (back again).
> The DFDL talks about the CS data types, with decorations to tell how
to
> do the transformations to bits. I think that's all DFDL does (which
is
> plenty!)
>
>
> Now the second place that "streams" enters the picture is
to deal with
> XML's notion of the order of elements, which DFDL is trying to use
> to deal with the order of the bits.
>
> (I think this confuses me because it is overloading XML's notion of
an
> XML file with the organization of the described files. You can
make
> it work, but it's not really clean, at least to me.)
I agree that this is a concern.
> To me, it is more natural to define a notion of a "sequence of
CS data types",
> i.e., the elements. The decorations indicate where the bits for each
> element are (i.e., each element has it's own sequence of bits, not
necessarily
> from a continuous stream).
>
> This is more general than a stream (it can accomodate random access),
and
> probably can be stated as a simple mapping.
>
>
> So the summary is:
>
> I think it would simplify the abstractions to not talk about streams.
>
> Instead, we should talk about sequences of bits, one for each element,
> and a model associating elements with bits.
>
>
> I hope this isn't to far off beam.
I think the big issue is the layering. It adds a complexity to the
question of position, do you mean index by byte, by character or by
comma separated value?
We want the representation of layers to be modular so that you can
replace the string representation with a binary representation and the
application (which is dealing with a list of numbers) does not need to
know. It is important that the descriptions are contained and that the
description of the integer list does not reference the underlying byte
positions.
Layering is IMO the reason that we need some formal description, it is
also the reason that it is hard. I would like to try to take this
forward a little. I think in the wake of this new spec it is timely.
Mike can you give me a steer on the IP status of your document. I
understand that it has not been submitted to the WG. Do you propose to
submit it? I think the basic outline is consistent with things you have
said at WG meetings (though not the level of detail). If we were to
produce a document that contained some of these ideas would that be a
problem?