
The only reason I haven't released the document "officially" to the working group is that it is very incomplete and half baked. There are no IP concerns. I'm concerned that the approach doesn't even hold up. In particular there is a "forward induction" from earlier fields to later fields implied in the approach. I'm not sure this works except for "stream-capable" formats. Some formats can depend on random access capabilities, or definition working back from the end of the data. Mike Beckerle Architect, Scalable Computing IBM Software Group Information Integration Solutions Westborough, MA Martin Westhead <martinwesthead@yahoo.co.uk> Sent by: owner-dfdl-wg@ggf.org 09/05/2005 09:12 AM To "Robert E. McGrath" <mcgrath@ncsa.uiuc.edu> cc dfdl-wg@gridforum.org Subject Re: [dfdl-wg] Plumbing document Hi Robert, See inline: Robert E. McGrath wrote:
Greetings,
I took a quick glance at the streams and semantics notes sent yesterday.
There is obviously common ground here, and both meshed with my own half-baked thinking.
They did make me disagree with one part of the approach, which may make things a little simpler.
So here's my thinking.
IMO, there is no reason to worry about a detailed definition of streams.
It seems to me that we are simply dealing with sequences of bits, which can be streams or not.
So I see the universe of DFDL as:
sequence of bits ==> computer science data type ==> seq. of bits
where CS data type is "byte", "int32", etc. (Z, G, et al. in the semantics note)
There are two issues/points I have with your statement above: 1. We have chosen (at this point) the XML/XML Schema data model as our data model. Another way of thinking about what we are doing here as follows: XML Schema provides a way of describing the syntax and type level semantics of XML documents. DFDL extends that capability so that XML Schema can describe other (want to say "all") text and binary formats. 2. DFDL is describing: bits ==> XML type ==> ... ==> XML type ==> ... ==> XML type ==> bits i.e. there are arbitrary layers of description that we would like that need to be separable (modular). e.g. bits ==> strings ==> ints ==> (back again).
The DFDL talks about the CS data types, with decorations to tell how to do the transformations to bits. I think that's all DFDL does (which is plenty!)
Now the second place that "streams" enters the picture is to deal with XML's notion of the order of elements, which DFDL is trying to use to deal with the order of the bits.
(I think this confuses me because it is overloading XML's notion of an XML file with the organization of the described files. You can make it work, but it's not really clean, at least to me.)
I agree that this is a concern.
To me, it is more natural to define a notion of a "sequence of CS data types", i.e., the elements. The decorations indicate where the bits for each element are (i.e., each element has it's own sequence of bits, not necessarily from a continuous stream).
This is more general than a stream (it can accomodate random access), and probably can be stated as a simple mapping.
So the summary is:
I think it would simplify the abstractions to not talk about streams.
Instead, we should talk about sequences of bits, one for each element, and a model associating elements with bits.
I hope this isn't to far off beam.
I think the big issue is the layering. It adds a complexity to the question of position, do you mean index by byte, by character or by comma separated value? We want the representation of layers to be modular so that you can replace the string representation with a binary representation and the application (which is dealing with a list of numbers) does not need to know. It is important that the descriptions are contained and that the description of the integer list does not reference the underlying byte positions. Layering is IMO the reason that we need some formal description, it is also the reason that it is hard. I would like to try to take this forward a little. I think in the wake of this new spec it is timely. Mike can you give me a steer on the IP status of your document. I understand that it has not been submitted to the WG. Do you propose to submit it? I think the basic outline is consistent with things you have said at WG meetings (though not the level of detail). If we were to produce a document that contained some of these ideas would that be a problem? Thanks, Martin