A thought.
We could add dfdl:lengthKind 'expression',
use dfdl:length for the expression, and add a new enum property that is
the policy for the expression - that is, evaluate on parse only, or evaluate
on both parse and unparse. The enum for the latter is the equivalent
of the pre-erratum 2.100 'explicit' and expression behaviour as implemented
by IBM DFDL today - which we would then deprecate in the spec (or even
remove) and deprecate in IBM DFDL.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Steve Hanson/UK/IBM
To:
Tim Kimber/UK/IBM@IBMGB,
"Mike Beckerle" <mbeckerle.dfdl@gmail.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
10/12/2014 09:08
Subject:
Re: [DFDL-WG]
Erratum 2.100 (was Action 183 - chicken-and-egg situation with lengths
given by expressions)
Tim & Mike - replies in-line.
Concensus is heading towards 'explicit'
working as per IBM DFDL today, so we need to decide on the best way to
express the 2.100 behaviour.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Tim Kimber/UK/IBM@IBMGB
To:
dfdl-wg@ogf.org
Date:
09/12/2014 23:25
Subject:
Re: [DFDL-WG]
Erratum 2.100 (was Action 183 - chicken-and-egg situation with lengths
given by expressions)
Sent by:
dfdl-wg-bounces@ogf.org
My 2 cents:
lengthKind 'explicit' should continue to work as it currently does in the
IBM implementation. That is, it should use the static or calculated value
of the length, and pad to that length when unparsing. Otherwise we have
different rules for lengthKind='explicit' depending on whether it is an
expression or a static value.
If somebody wants to avoid padding,
then they can put an outputValueCalc on the length field and calculate
a value that requires no padding. I think the required expression would
be
{dfdl:valueLength(../variableLengthField)}
With these rules there is a potential for a mutual dependency ( deadlock
) when outputValueCalc is used with a calculated length. If the outputValueCalc
is
{dfdl:contentLength(../variableLengthField)}
then the entire representation length
( including the padding characters ) is being requested. This obviously
cannot be satisfied until the padding has been applied, but the padding
cannot be applied until the length is known...etc. I don't think this should
affect the decision about lengthKind though, for two reasons:
- it's already possible to create deadlocks using other calculated fields
in DFDL. Any robust implementation of a serializer must include deadlock
detection in the evaluation of DFDL expressions.
- it would not be difficult to detect the simple case statically and report
a schema definition error.
SMH: Corrected function names
to match spec.
The only downside of this is that users who want to calculate the output
length for themselves have to jump through the non-obvious hoop of adding
outputValueCalc to the length field and selecting the correct (non-deadlocking)
DFDL function.
But for fields have representation='text'
and lengthUnits='characters' there is a simpler alternative: set the length
expression to {fn:length(.)}. Other cases will be derivable from the infoset
value by a simple calculation, so the outputValueCalc solution should only
be required for the hard cases.
SMH: Your fn:length(.) does
not work unfortunately. The expression does not give the desired length
when parsing.
regards,
Tim Kimber
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve
Hanson/UK/IBM@IBMGB
Cc: DFDL-WG
<dfdl-wg@ogf.org>
Date: 09/12/2014
18:06
Subject: Re:
[DFDL-WG] Erratum 2.100 (was Action 183 - chicken-and-egg situation with
lengths given by expressions)
Sent by: dfdl-wg-bounces@ogf.org
Idea 1:
Just provide an infoset member [contentLength] which would allow one to
get and (when unparsing) set the content length. This would provide the
"sticky" content length memory that people are seeking when they
say they want data to round-trip. If the same infoset object is parsed
and then unparsed, the existence of this [contentLength] member provides
the memory of the content length from the parse. An outputValueCalc could
meaningfully ask for dfdl:contentLength() which would return the value
of this member. If unset, then when unparsing the value of this member
can be defined to be the dfdl:valueLength() of the infoset value.
SMH; Not keen on this. It makes the DFDL infoset
deviate too much from the XML infoset.
Idea 2:
Per our discussion on the call, I think we are trying to get two behaviors
into the same length kind, and we can't have it both ways.
So I am fine if we say lengthKind 'explicit' evaluates the expression both
parsing and unparsing.
I would suggest new lengthKind 'explicitParseOnly' evaluates the expression
only when parsing. When this is used, an outputValueCalc would be needed
to compute the length value and unparse the representation of it. That
calculation would likely need to refer to the dfdl:valueLength() of the
element whose length needs to be stored.
SMH: Not keen on the name. Let's keep thinking!
Idea 3:
An alternative would be to introduce property dfdl:unparseContentLength
which is an integer constant, or an expression, or one of three special
enum values. When dfdl:unparseContentLength='useLength' then it would be
the same value as the length (same constant, or computed by evaluating
the same expression). Or if dfdl:unparseContentLength='useValueLength'
would mean that the length is equivalent to writing dfdl:unparseContentLength='{
dfdl:valueLength(., LU) }', where LU is the length units of the element
that is the first argument). Nothing would be automatically saving out
the explicitly computed length, so an outputValueCalc would need to be
able to get at the dfdl:valueLength or dfdl:contentLength - which would
return the value of the dfdl:unparseContentLength)
A thought: perhaps we really only need the enum values, and not the ability
to specify constants or expressions? In that case I'd suggest the property
have the "policy" suffix on the name, i.e., dfdl:unparseContentLengthPolicy.
Other possible name choices for this property: dfdl:explicitLengthKindUnparsePolicy
or dfdl:lengthKindExplicitUnparsePolicy.
I guess we get our choice of whether we add new length kinds along with
'explicit' and we define lengthKind 'explicit' to be one of these behaviors,
or we add this enum policy property to add additional meaning when the
lengthKind is 'explicit'.
The difference of the two is taste and style I think.
SMH: I prefer a new enum for lengthKind.
That's the point of the property. It's much easier to say in the spec '...when
dfdl:lengthKind is 'prefixed' or 'expression'...' than '...when dfdl:lengthKind
is 'prefixed', or 'explicit' and dfdl:length is an expression and dfdl:unparseContentLength
is 'xxx' ...'.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Mon, Dec 8, 2014 at 11:53 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
This is the email thread that discussed the behaviour of lengthKind 'explicit'
where length is an expression, when unparsing.
Erratum 2.100 had been raised and stated that this is an example of a variable
length, and behaves like lengthKind 'prefixed'. The discussion below was
around whether this captured all the use cases and should the dfdl:length
expression be evaluated during unparsing in order to obtain a specified
length to which the value could be padded (or truncated) in the same way
as a fixed length. The conclusion was that the expression should
not be evaluated. This is consistent with occursCount - the expression
is not evaluated on unparsing.
(There was a public comment on this area but it was just observing that
the spec had not been fully updated to reflect 2.100)
The use cases are numbered
in the discussion below 1) to 4), with examples.
Updated to reflect function name changes.
The original proposal is also below,
which was not adopted.
However, an issue is that IBM DFDL already implements use case 1) by evaluating
the expression on unparsing. I discovered this when I came to change the
code to ensure 2.100 behaviour was followed. It is therefore likely that
there are users who are relying on this behaviour in IBM DFDL.
Coincidentally the same day I was asked by a user about the following use
case. User has BCD data with length prefix. He wants to parse the
data, do some processing then unparse the data, but it is not acceptable
for the length of the data to change. Specifically, the BCD may start with
0s. When this is parsed as a decimal or integer, it has the effect of losing
leading 0s so they need to be added back during unparsing. This is
not possible with lengthKind 'prefixed' and nor is it possible with lengthKind
'expression' with erratum 2.100. User is currently treating the element
as a 'prefixed' hexBinary blob to preserve the leading 0s. While this works
for his use case, I am worried that the blob solution won't always be appropriate.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 08/12/2014 15:53 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org
Date: 08/01/2013
18:06
Subject: Fw:
[DFDL-WG] Action 183 - chicken-and-egg situation with lengths given
by expressions
Did not adopt the proposal below, and errata 2.100 remains as documented.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 08/01/2013 18:05 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 08/01/2013
14:44
Subject: Re:
Fw: [DFDL-WG] Action 183 - chicken-and-egg situation with lengths given
by expressions
If we had the following we would have three behaviours which I think would
cover the identified use cases.
a) dfdl:lengthKind = "explicit", dfdl:length is an integer =>
length is specified on parsing & unparsing
b) dfdl:lengthKind = "explicit", dfdl:length is an expression
=> length is specified on parsing & unparsing (expression must evaluate)
- use cases 1) & 2) below
- Mike's scenario where length is specified externally
c) dfdl:lengthKind = "expression" (*new*), dfdl:length must be
an expression => length is specified on parsing, but variable on unparsing
(expression not evaluated)
- new enum
- use case 3) below
- gives the erratum 2.100 behaviour
- matches occursCountKind 'expression' behaviour
Alternatively we leave errata 2.100 as it is and for Mike's scenario the
application must do the padding before it passes the infoset values to
the unparser.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org,
Date: 11/12/2012
10:36
Subject: Fw:
[DFDL-WG] Action 183 - chicken-and-egg situation with lengths given
by expressions
The email thread which provided the material for the discussion which led
to:
2.100. Section 12.3.1. State that when unparsing an element
with lengthKind ‘explicit’ and where length is an expression, then the
data in the Infoset is treated as variable length and not fixed length.
The behaviour is the same as lengthKind ‘prefixed’.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 16/11/2012 16:04 -----
From: Tim
Kimber/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 11/09/2012
12:39
Subject: Re:
[DFDL-WG] Action 183 - chicken-and-egg situation with lengths given
by expressions
Sent by: dfdl-wg-bounces@ogf.org
Good point. The problem is that lengthKind-'explicit' is being used for
two things:
a) a length that is static
b) a length that is calculated
...so the DFDL serializer must assume that the expression needs to be evaluated.
For occursCountKind we have separate values for 'fixed' and 'expression'.
If we did not, then occursCountKind would have the same problem except
that it would affect defaulting rather than padding.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Steve
Hanson/UK/IBM@IBMGB
To: dfdl-wg@ogf.org,
Date: 11/09/2012
12:27
Subject: [DFDL-WG]
Action 183 - chicken-and-egg situation with lengths given
by expressions
Sent by: dfdl-wg-bounces@ogf.org
This mail is on the expected behaviour of the DFDL unparser when writing
out a 'data' element the length of which is held in an earlier 'len' element.
There are several scenarios, some straightforward and some that exhibit
a chicken-and-egg behaviour. The principle of what happens is understood,
the action is to make sure that the behaviour is explained in enough detail
in the spec to enable implementations to be consistent. (Note - IBM DFDL
does not yet support outputVaueCalc so has not hit this yet).
Scenarios follow. The 'data' element shown is simple, but the same principles
apply if it is complex.
1) 'len' is set from infoset
- 'len' can be set in augmented infoset
- No issue as 'data's length expression may be evaluated
<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:lengthKind="explicit"
dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}"
dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
2) 'len' is set using outputValueCalc with fixed expression
- When 'len's outputValueCalc is encountered, it can be evaluated then
and there
- 'len' can be set in augmented infoset
- No issue as 'data's length expression may be evaluated
<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:outputValueCalc="{10}"
dfdl:lengthKind="explicit" dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}"
dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
3) 'len' is set using outputValueCalc with reference 'data' (unpadded)
- When 'len's outputValueCalc is encountered, it can not yet be evaluated
as it depends on the length of 'data'
- 'len' can not yet be set in augmented infoset
- Problem as 'data's length expression can not be evaluated
- But we do know the unpadded length of 'data' so 'len's outputValueCalc
can now be evaluated
- In turn this means that 'data's length expression can now be evaluated
<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:outputValueCalc="{dfdl:valueLength(/message1/data)}"
dfdl:lengthKind="explicit" dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}"
dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
4) 'len' is set using outputValueCalc with reference 'data' (padded)
- When 'len's outputValueCalc is encountered, it can not yet be evaluated
as it depends on the length of 'data'
- 'len' can not yet be set in augmented infoset
- Problem as 'data's length expression can not be evaluated
- We don't know the padded length of 'data' because we don't know 'len'
- Problem: 'data's length expression can never be evaluated
<xsd:element name="message1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="len" type="xsd:int"
dfdl:outputValueCalc="{dfdl:contentLength(/message1/data)}"
dfdl:lengthKind="explicit" dfdl:length="2" />
<xsd:element name="data" type="xsd:string"
dfdl:length="{/message1/len}"
dfdl:lengthKind="explicit" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org
Date: 04/09/2012
17:31
Subject: Fw:
Behaviour for lengthKind 'endOfparent' is still not fully specified
DFDL WG call 4th Sept 2012:
1) Agreed that for binary data, only xs:hexBinary and packed/BCD allowed
to have endOfParent
2) Agreed this is the correct behaviour when filling to a known length
3) Agreed this is the correct behaviour when filling to a known length
4) Agreed this is the correct behaviour when filling to a known length
It was noted that lengthKind 'explicit' on the parent may not result in
a known length if the length is an expression. This is an example of a
more general chicken-and-egg situation with lengths given by expressions,
for which outputValueCalc and DFDL functions unpaddedLength() were added
can be used. Action raised to ensure that the behaviour of an implementation
is fully defined by the spec.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 04/09/2012 17:24 -----
From: Steve
Hanson/UK/IBM
To: dfdl-wg@ogf.org
<dfdl-wg@ogf.org>
Date: 04/09/2012
14:08
Subject: Behaviour
for lengthKind 'endOfparent' is still not fully specified
Noted when I reviewed latest spec - endOfParent and unparsing is not fully
thought through.
The spec today says that I can use endOfParent with binary data. There
is a restriction in section 12.3.8, but it only applies when an element
is endOfParent and its parent is lengthKind delimited.
There are a couple of cases to consider:
1) Binary data of restricted length (see list in other email "proposed
clarification/narrowing - delimited binary data should decimal").
I don't think it makes sense to allow these. We don't allow these binary
reps for delimited.
2) Text data of variable length when unparsing. Box scenario. If the data
in the infoset is shorter than the space in the box, what we do? I
think we should pad to box length with appropriate padChar, according to
justification, as that is effectively a 'specified length'. Error if textPadKind
is 'none'. Use parent's lengthUnits.
3) HexBinary data of variable length when unparsing. Box scenario. If the
data in the infoset is shorter than the space in the box, what we do? I
think we should right-pad to box length with fill byte, as that is effectively
a 'specified length'.
4) Packed/BCD binary data of variable length when unparsing. Box scenario.
If the data in the infoset is shorter than the space in the box, what we
do? I think we should pad to box length with zero bytes, according
to justification, as that is effectively a 'specified length'. (Must
be zero bytes and not fill byte as must be numeric in order to be parsed).
In relation to 2 - 4, note that lengthKind 'endOfParent' can only be used
with a parent lengthKind of 'explicit', 'pattern', 'prefixed' or 'endOfParent'
or a choice with choiceLengthKind 'explicit', so the box scenario when
unparsing therefore occurs only when lengthKind is 'explicit' or choiceLengthKind
is 'explicit' - these are the cases when the length is known. Also
note that when there are nested 'endOfParent' elements (which is allowed)
then all padding must be done on the simple element (ie, the innermost
element), to ensure that what is output can be parsed.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU