This is the email thread that discussed the behaviour of lengthKind 'explicit' where length is an expression, when unparsing.

Erratum 2.100 had been raised and stated that this is an example of a variable length, and behaves like lengthKind 'prefixed'. The discussion below was around whether this captured all the use cases and should the dfdl:length expression be evaluated during unparsing in order to obtain a specified length to which the value could be padded (or truncated) in the same way as a fixed length. The conclusion was that the expression should not be evaluated. This is consistent with occursCount - the expression is not evaluated on unparsing.

(There was a public comment on this area but it was just observing that the spec had not been fully updated to reflect 2.100)

The use cases are numbered in the discussion below 1) to 4), with examples. Updated to reflect function name changes.

The original proposal is also below, which was not adopted.

However, an issue is that IBM DFDL already implements use case 1) by evaluating the expression on unparsing. I discovered this when I came to change the code to ensure 2.100 behaviour was followed. It is therefore likely that there are users who are relying on this behaviour in IBM DFDL.

Coincidentally the same day I was asked by a user about the following use case. User has BCD data with length prefix. He wants to parse the data, do some processing then unparse the data, but it is not acceptable for the length of the data to change. Specifically, the BCD may start with 0s. When this is parsed as a decimal or integer, it has the effect of losing leading 0s so they need to be added back during unparsing. This is not possible with lengthKind 'prefixed' and nor is it possible with lengthKind 'expression' with erratum 2.100. User is currently treating the element as a 'prefixed' hexBinary blob to preserve the leading 0s. While this works for his use case, I am worried that the blob solution won't always be appropriate.

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 08/12/2014 15:53 -----

From: Steve Hanson/UK/IBM
To: dfdl-wg@ogf.org
Date: 08/01/2013 18:06
Subject: Fw: [DFDL-WG] Action 183 - chicken-and-egg situation with lengths given by expressions

Did not adopt the proposal below, and errata 2.100 remains as documented.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 08/01/2013 18:05 -----

From: Steve Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 08/01/2013 14:44
Subject: Re: Fw: [DFDL-WG] Action 183 - chicken-and-egg situation with lengths given by expressions

If we had the following we would have three behaviours which I think would cover the identified use cases.

a) dfdl:lengthKind = "explicit", dfdl:length is an integer => length is specified on parsing & unparsing

b) dfdl:lengthKind = "explicit", dfdl:length is an expression => length is specified on parsing & unparsing (expression must evaluate)
- use cases 1) & 2) below
- Mike's scenario where length is specified externally

c) dfdl:lengthKind = "expression" (*new*), dfdl:length must be an expression => length is specified on parsing, but variable on unparsing (expression not evaluated)
- new enum
- use case 3) below
- gives the erratum 2.100 behaviour
- matches occursCountKind 'expression' behaviour

Alternatively we leave errata 2.100 as it is and for Mike's scenario the application must do the padding before it passes the infoset values to the unparser.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From: Steve Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 11/12/2012 10:36
Subject: Fw: [DFDL-WG] Action 183 - chicken-and-egg situation with lengths given by expressions

The email thread which provided the material for the discussion which led to:

2.100. Section 12.3.1. State that when unparsing an element with lengthKind ‘explicit’ and where length is an expression, then the data in the Infoset is treated as variable length and not fixed length. The behaviour is the same as lengthKind ‘prefixed’.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 16/11/2012 16:04 -----

From: Tim Kimber/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 11/09/2012 12:39
Subject: Re: [DFDL-WG] Action 183 - chicken-and-egg situation with lengths given by expressions
Sent by: dfdl-wg-bounces@ogf.org

Good point. The problem is that lengthKind-'explicit' is being used for two things:
a) a length that is static
b) a length that is calculated
...so the DFDL serializer must assume that the expression needs to be evaluated.

For occursCountKind we have separate values for 'fixed' and 'expression'. If we did not, then occursCountKind would have the same problem except that it would affect defaulting rather than padding.

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742

From: Steve Hanson/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 11/09/2012 12:27
Subject: [DFDL-WG] Action 183 - chicken-and-egg situation with lengths given by expressions
Sent by: dfdl-wg-bounces@ogf.org

This mail is on the expected behaviour of the DFDL unparser when writing out a 'data' element the length of which is held in an earlier 'len' element.

There are several scenarios, some straightforward and some that exhibit a chicken-and-egg behaviour. The principle of what happens is understood, the action is to make sure that the behaviour is explained in enough detail in the spec to enable implementations to be consistent. (Note - IBM DFDL does not yet support outputVaueCalc so has not hit this yet).

Scenarios follow. The 'data' element shown is simple, but the same principles apply if it is complex.

1) 'len' is set from infoset

- 'len' can be set in augmented infoset
- No issue as 'data's length expression may be evaluated

<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:lengthKind="explicit" dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}" dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>

2) 'len' is set using outputValueCalc with fixed expression

- When 'len's outputValueCalc is encountered, it can be evaluated then and there
- 'len' can be set in augmented infoset
- No issue as 'data's length expression may be evaluated

<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:outputValueCalc="{10}" dfdl:lengthKind="explicit" dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}" dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>

3) 'len' is set using outputValueCalc with reference 'data' (unpadded)

- When 'len's outputValueCalc is encountered, it can not yet be evaluated as it depends on the length of 'data'
- 'len' can not yet be set in augmented infoset
- Problem as 'data's length expression can not be evaluated
- But we do know the unpadded length of 'data' so 'len's outputValueCalc can now be evaluated
- In turn this means that 'data's length expression can now be evaluated

<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:outputValueCalc="{dfdl:valueLength(/message1/data)}" dfdl:lengthKind="explicit" dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}" dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>

4) 'len' is set using outputValueCalc with reference 'data' (padded)

- When 'len's outputValueCalc is encountered, it can not yet be evaluated as it depends on the length of 'data'
- 'len' can not yet be set in augmented infoset
- Problem as 'data's length expression can not be evaluated
- We don't know the padded length of 'data' because we don't know 'len'
- Problem: 'data's length expression can never be evaluated

<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:outputValueCalc="{dfdl:contentLength(/message1/data)}" dfdl:lengthKind="explicit" dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}" dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From: Steve Hanson/UK/IBM
To: dfdl-wg@ogf.org
Date: 04/09/2012 17:31
Subject: Fw: Behaviour for lengthKind 'endOfparent' is still not fully specified

DFDL WG call 4th Sept 2012:

1) Agreed that for binary data, only xs:hexBinary and packed/BCD allowed to have endOfParent

2) Agreed this is the correct behaviour when filling to a known length

3) Agreed this is the correct behaviour when filling to a known length

4) Agreed this is the correct behaviour when filling to a known length

It was noted that lengthKind 'explicit' on the parent may not result in a known length if the length is an expression. This is an example of a more general chicken-and-egg situation with lengths given by expressions, for which outputValueCalc and DFDL functions unpaddedLength() were added can be used. Action raised to ensure that the behaviour of an implementation is fully defined by the spec.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 04/09/2012 17:24 -----

From: Steve Hanson/UK/IBM
To: dfdl-wg@ogf.org <dfdl-wg@ogf.org>
Date: 04/09/2012 14:08
Subject: Behaviour for lengthKind 'endOfparent' is still not fully specified

Noted when I reviewed latest spec - endOfParent and unparsing is not fully thought through.

The spec today says that I can use endOfParent with binary data. There is a restriction in section 12.3.8, but it only applies when an element is endOfParent and its parent is lengthKind delimited.

There are a couple of cases to consider:

1) Binary data of restricted length (see list in other email "proposed clarification/narrowing - delimited binary data should decimal"). I don't think it makes sense to allow these. We don't allow these binary reps for delimited.

2) Text data of variable length when unparsing. Box scenario. If the data in the infoset is shorter than the space in the box, what we do? I think we should pad to box length with appropriate padChar, according to justification, as that is effectively a 'specified length'. Error if textPadKind is 'none'. Use parent's lengthUnits.

3) HexBinary data of variable length when unparsing. Box scenario. If the data in the infoset is shorter than the space in the box, what we do? I think we should right-pad to box length with fill byte, as that is effectively a 'specified length'.

4) Packed/BCD binary data of variable length when unparsing. Box scenario. If the data in the infoset is shorter than the space in the box, what we do? I think we should pad to box length with zero bytes, according to justification, as that is effectively a 'specified length'. (Must be zero bytes and not fill byte as must be numeric in order to be parsed).

In relation to 2 - 4, note that lengthKind 'endOfParent' can only be used with a parent lengthKind of 'explicit', 'pattern', 'prefixed' or 'endOfParent' or a choice with choiceLengthKind 'explicit', so the box scenario when unparsing therefore occurs only when lengthKind is 'explicit' or choiceLengthKind is 'explicit' - these are the cases when the length is known. Also note that when there are nested 'endOfParent' elements (which is allowed) then all padding must be done on the simple element (ie, the innermost element), to ensure that what is output can be parsed.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU-- dfdl-wg mailing list dfdl-wg@ogf.orghttps://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU