Yes that sounds like the best way to do it.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:        Steve Hanson/UK/IBM@IBMGB,
Cc:        dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org
Date:        03/12/2012 13:35
Subject:        Re: [DFDL-WG] Puzzle: unparsing format involving lengthKind='pattern'




Actually, that helps.

I think that, plus an inputValueCalc/outputValueCalc pair will fix it. The inputValueCalc strips any trailing NUL, the outputValueCalc adds one if the length is less than 64.

On Mon, Dec 3, 2012 at 7:21 AM, Steve Hanson <smh@uk.ibm.com> wrote:
Afraid not. The problem is that the NUL is not a terminator in the DFDL sense of the word (ie, mandatory) but is an early-end-of-data indicator.
I can't think of an elegant way to handle this so I would simply model the data as a single string with a dfdl:lengthPattern that consumed either 0-63 chars plus NUL or 64 chars. This puts the NUL in the infoset and puts the onus on the user to trim the NUL when reading the infoset, and supply the NUL when creating the infoset.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:
+44-1962-815848



From:        
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:        
dfdl-wg@ogf.org,
Date:        
30/11/2012 21:02
Subject:        
[DFDL-WG] Puzzle: unparsing format involving lengthKind='pattern'
Sent by:        
dfdl-wg-bounces@ogf.org






I have a file format where strings have this unusual discipline for termination.

The string is either 64 characters long or, it is from 0 to 63 characters long, with a NUL terminator.

I can parse this like so:

    <xs:complexType name="myStringType">
      <xs:sequence>
        <xs:element name="s" type="xs:string"
          dfdl:lengthKind="pattern">
          <xs:annotation>
            <xs:appinfo source="
http://www.ogf.org/dfdl/dfdl-1.0/">
              <!-- 0 to 63 occurrences of not a Nul, followed by a Nul
                   (final nul non-captured in the pattern match result), OR, just 64 non Nuls -->
              <dfdl:element>
                <dfdl:property name="lengthPattern"><![CDATA[([^\x00]{0,63})(?=\x00)|[^\x00]{64}]]></dfdl:property>
              </dfdl:element>
            </xs:appinfo>
          </xs:annotation>
        </xs:element>
     <xs:element name="term" type="xs:string"
        dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="&NUL;" dfdl:outputValueCalc="{ '' }"
        minOccurs="0" maxOccurs="1" dfdl:occursCountKind="expression"
        dfdl:occursCount="{ if(fn:string-length(../tns:s) lt 64) then 1 else 0 }" />
      </xs:sequence>
    </xs:complexType>

What I did is use a lengthKind pattern to pick off the content, excluding the NUL, and then model the NUL explicitly as the initiator of an empty element which is optionally occurring depending on the length of the string.

That will work. I'd like to hide the "term" element in a hidden group, but other than that it will work for parsing.

Question: How can I unparse this?

I want the schema and DFDL processor to put the "term" element into the infoset by itself depending on the length of the 's' element that I will place into the infoset. So I put an outputValueCalc on there to assign it a value of empty string, but that will only happen if the occursCount expression causes the optional element to exist at all.

Will this work for unparsing?

Spec currently says that occursCount expression is used only when parsing, otherwise the number in the infoset are used. But I don't want to have to put this syntax-modeling element into the infoset from the application. I just want the application to create a string, and then based on whether it is 0-63 long, I want the Nul added or not.

Other ways I've tried to model these strings include as a choice of two different elements. That has a similar issue that I then have to assemble my infoset for output using a length dependent element. I can't just put a string into the infoset and have it decide when unparsing which of the two choice branches is the right one. (asserts and discriminators are parse-only also.)

Anyone have better ideas?

...mike

--
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel: 
781-330-0412
--
 dfdl-wg mailing list
 
dfdl-wg@ogf.org
 
https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU




--
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU