RE: [dfdl-wg] simple way to study hard DFDL example problem - IBMFormat VS rec ords as XML

22 Nov 2004

      I wrote my previous mail fairly quickly just before I left on Friday to get
something on the table. I've been thinking about this problem over the
weekend and have some more thoughts which might help me get across where I
am coming from.

The way I view physical rep information is as functions that can be applied
to types and fields. Writing the data out to a blocked/segmented format
does not fall into this category. It is an orthogonal operation that
applies to the whole data and as such is much more akin to encryption and
compression. For example, I have a COBOL structure that ends up in an
MQSeries queue and in a QSAM file. It has a logical structure, it has a
physical representation. In the QSAM case a further transform has taken
place to block/segment the structure. I would not expect to see the
physical rep properties of the types and elements change.

Mike's idea of a schema level 'stream' rep property sounds ok in principle
for parsing, but what other metadata is needed when serialising? How are we
informed of the rules for VB blocking or for IMS segmentation? Are they
fixed or user-defined? If these rules end up requiring extra metadata at
the type/element level then I am not comfortable with this, because we are
mixing two sets of physical information.

I think that whatever principles we apply to DFDL including/excluding
encryption and compression we should also apply to these formats.  What is
the current proposal in this area? The cheapest option would be to provide
a flexible user-defined transform capability.

We can discuss more on this week's call, but it sounds like this is another
of the high-level design issues to be included in the F2F agenda.

Finally a correction. When I said that the broker does not support these 19
or whatever formats, I should have been more specific and said that the
broker's message model does not support these. That is, we do not provide
physical rep annotation support  for such formats, for the reason stated
above. The expectation is that is that the
decryption/decompression/deblocking has all taken place as a separate
transformation elsewhere in the broker.

Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

             "Myers, James D"                                              
             <jim.myers@pnl.go                                             
             v>                                                         To 
             Sent by:                  dfdl-wg@gridforum.org               
             owner-dfdl-wg@ggf                                          cc 
             .org                                                          
                                                                   Subject 
                                       RE: [dfdl-wg] simple way to study   
             19/11/2004 17:04          hard DFDL example problem -         
                                       IBMFormat VS rec      ords as XML   

I think we at least agree in practice that there's a limit on how
complex a transform you'd want to code in DFDL logic. Not sure if we
agree on whether it is possible.

As for LR parsers - I'm not a parser guy, but I just looked at the
wikipedia entry :-) :

Seems like a simple enough concept - if you let me have layers, and I
can use information in those layers to select choices for further
processing, can you stop me from making an LR parser (or doing what an
LR parser does)? I've got a stack, and choices let me specify an action
table... In the same way that if you give me layers (or variables),
addition, and for loops, you can't stop me from doing multiplication.
And if you require those  things for other reasons but don't need
multiplication, you can't really talk about excluding multiplication
from the language design. You can say that we won't worry about
multiplcation examples or how easy it is to write them down or what
performance you'll get trying to run them and suggest that you plug
something in to handle them directly though, and this is probably what
we need to do in DFDL.

I may still be missing something and there is a piece of functionality
that we haven't identified a need for that would be needed for an LR
parser/our pathological examples, but I guess I'm getting more convinced
that our primitives are sufficiently powerful that they can be
used/abused to do all of the complex things that have come up. I'm not
sure how we can close the issue - specify the map from DFDL primitives
to LR parser as I started to above, or find an example known to require
LR parsing and work it? Or?

  Jim
...
-----Original Message-----
From: owner-dfdl-wg@ggf.org [mailto:owner-dfdl-wg@ggf.org] On
Behalf Of mike.beckerle@ascentialsoftware.com
Sent: Friday, November 19, 2004 11:36 AM
To: smh@uk.ibm.com; dfdl-wg@gridforum.org
Subject: RE: [dfdl-wg] simple way to study hard DFDL example
problem - IBMFormat VS rec ords as XML
I believe you and Jim are actually disagreeing. Jim is saying
he's still optimistic that this transformation, even though
complex, can be expressed directly in DFDL. You are saying
this would require XSLT or a Java program or whatever to do it.
...
Mike you say you are aware of 19 such legacy formats, and I
bet there are more. Well IBM's broker has no specific support
for any of these, nor have we been asked to incorporate them
into our message model. Maybe we should play the percentages
game - if we see enough different subsystems that use the
same cryptic format then it becomes worth building the
support into DFDL.
Ascential supports 6 or 7 of these formats today. Batch systems will
encounter this more than online. You get them when a
mainframe job writes
out a tape on a mainframe, and then you read that tape on a
unix tape drive
either directly or first into a file. Alternatively, you pick
up a mainframe
file via FTP or some such and directly operate on it on other systems.
Mainframe software handles all the VS block and and such
stuff in the lower
layers as you know (not to mention the tape label) unix
software does none
of this, you just get the raw bytes.
My point is not as much about these 19 or more particular
formats, but the
issue of how much complexity we go after.
In the past we've looked at things like logical arrays with
run-length-encoded representations and the suggestion has
been there that
DFDL might be able to directly express this transformation
without need to
go outside DFDL.
I've come to believe there are certain limits to this
complexity and I think
perhaps tree-shape compatibility is at the core of them.
Building a DFDL
description for data that ultimately requires an LR(k)
sophistication parser
to correctly interpret the data is clearly a non-starter it
seems. Where
this line is drawn is important.
...mikeb
...mikeb