Message

Without digging too much into the details, I'd say this is an example where multi-layer comes in. The DFDL would describe a hidden layer in which the first, middle, last data elements would be identified and put into a list, and then that hidden list would be used as the input to create items in the output layer.

I think this is conceptually similar to one of our run-length encoding examples (more complex of course). If you read a sequence if ints and then a sequence of floats and need to output a sequence of floats with int[i] repeats of float[i], it would be easiest to create a hidden layer representing the int and float sequences and to then produce output from that. If you don't think about a layer, even this example gets painful - I need to read an int, skip forward somewhere to find a float, skip back to get the next int, etc.

Mike's full example, not starting with the XML-ized version, might be something that requires more than one layer - read the original into something with with XML schema Mike defines, then a layer making a sequence of data elements, and then something that has the desired logical output.

I guess I would claim that this would not be too bad a way to describe a fairly complex format in terms of a fairly different logical structure. Whether one *should* do this in DFDL, or whether it would make more sense to a) write a black box parser to get to items, or b) use DFDL to get to the initial schema Mike wrote and use XSLT afterwards to convert to the desired logical structure. I think there are enough cases where we need the multilayer functionality in DFDL that are relatively simple that we have to have it, which means it will then be possible to deal with complex transformations in DFDL even if not simple/practical.

Jim

-----Original Message-----
From: owner-dfdl-wg@ggf.org [mailto:owner-dfdl-wg@ggf.org] On Behalf Of mike.beckerle@ascentialsoftware.com
Sent: Thursday, November 18, 2004 9:53 PM
To: dfdl-wg@gridforum.org
Subject: [dfdl-wg] simple way to study hard DFDL example problem - IBM Format VS rec ords as XML

I've come up with a way to articulate the difficulties I'm having with DFDL for complex file formats.

This problem may not be that hard for someone with more XML, XPath or XQuery experience, so I'd apprecate it if you could look it over and if necessary even run it by your resident XML experts.

In case the emailer mangles all the line lengths, I've also attached the below as a file.






<ITEM>The first item</ITEM>
<ITEM>This is the second item</ITEM>
<ITEM>The third</ITEM>



<sequence>
<element name="ITEM" type="string" minOccurs="0" maxOccurs="unbounded"/>
</sequence>



<BLOCK>
<SEGMENT>
    <WHOLE/> 
    <DATA>The first item</DATA>
</SEGMENT>
</BLOCK>

<BLOCK>
<SEGMENT>
    <FIRST/> 
    <DATA>Thi</DATA>
</SEGMENT>
</BLOCK>

<BLOCK>
<SEGMENT>
    <MIDDLE/> 
    <DATA>s is t</DATA>
</SEGMENT>
</BLOCK>

<BLOCK>
<SEGMENT>
    <MIDDLE/>
    <DATA>he sec</DATA>
</SEGMENT>
</BLOCK>

<BLOCK>
<SEGMENT>
    <LAST/> 
    <DATA>ond item</DATA>
</SEGMENT>
<SEGMENT>
    <WHOLE/>
    <DATA>Third item</DATA>
</SEGMENT>
</BLOCK>













<complexType name="Format_VS_t">
<sequence>
   <element name="BLOCK" type="Block_t" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>

<complexType name="Block_t">
      <sequence>
         <element name="SEGMENT" type="Segment_t" minOccurs="1" maxOccurs="2"/>
      </sequence>
</complexType>

<complexType name="Segment_t">
<sequence>
<choice>
    <element name="WHOLE">
    </element>
    <element name="FIRST">
    </element>
    <element name="LAST">
    </element>
    <element name="MIDDLE">
    </element>
</choice>
<element name="DATA" type="string"/>
</sequence>
</complexType>