RE: [dfdl-wg] simple way to study hard DFDL example problem - IBMFormat VS rec ords as XML

22 Nov 2004

...
The way I view physical rep information is as functions that 
can be applied to types and fields. Writing the data out to a 
blocked/segmented format does not fall into this category. It 
is an orthogonal operation that applies to the whole data and 
as such is much more akin to encryption and compression. For 
example, I have a COBOL structure that ends up in an MQSeries 
queue and in a QSAM file. It has a logical structure, it has 
a physical representation. In the QSAM case a further 
transform has taken place to block/segment the structure. I 
would not expect to see the physical rep properties of the 
types and elements change.
I think we've been talking about DFDL as always going TO the XML schema
and have considered the process of going FROM the XML to a new
serialization as 'inverse DFDL'. Towards that end, we've discussed being
able to mark transforms as invertible and/or allowing an inverse method
to be registered as part of the transform definition. We also talked
about the potential requirement of having multiple output streams: if I
read x and y dimensions and then pixels, but my output XML model is just
the pixel sequence, I will need to record x and y somewhere to allow
inversion, so the user (or DFDL) might want to specify x and y in some
separate 'provenance' file that could be used during inversion.

I'm not sure that this is the best model, but I don't think we've come
up with a good way to describe going from the XML model except as the
inverse of the to process.
...
Mike's idea of a schema level 'stream' rep property sounds ok 
in principle for parsing, but what other metadata is needed 
when serialising? How are we informed of the rules for VB 
blocking or for IMS segmentation? Are they fixed or 
user-defined? If these rules end up requiring extra metadata 
at the type/element level then I am not comfortable with 
this, because we are mixing two sets of physical information.
I think that whatever principles we apply to DFDL 
including/excluding encryption and compression we should also 
apply to these formats.  What is the current proposal in this 
area? The cheapest option would be to provide a flexible 
user-defined transform capability.
We planned to have a user-defined transform capability that would appear
in the same way as DFDL-standard transforms. I think one can easily put
something like zip into the same format as Alan has done for the basic
int from ascii, int from binary transforms, as a byte sequence to byte
sequence transform. I think I'd vote for just including zip since it
will be used in a number of formats, but one could imagine a user adding
a de-pig-latinizer as needed. (Pig latin, and things like run-length
encoding are examples we've used to point out that not all
compression/encryption type algorithms will run on the raw input stream
- both of these require some level of parsing before you can use them -
to find words or to get the <value, # of repeats > pairs from the
initial bytes.

RE: [dfdl-wg] simple way to study hard DFDL example problem - IBMFormat VS rec ords as XML

Myers, James D