Mike

I'd like to state what we said were the use cases yesterday. There were three. .

Use case 1
Element "val" is fixed length, length known at design time and provided by dfdl:length="x" on input and output.
On output, the infoset data for "val" is padded to the length.

Use case 2
Element "val" is fixed length, length known at runtime and provided by dfdl:length="{..\len}" on input and output.
On output the infoset provides the value for element "len".
On output, the infoset data for "val" is padded to the length according to the rules for dfdl:lengthKind='explicit'/'implicit'.

Use case 3
Element "val" is really variable length, length only known once the data is serialised, and provided by dfdl:length="..\len" on input.
On output the value of element "len" is set only once the length of "val" is known.
On output, the infoset data for "val" is not padded to the length.

You've added a variation to use case 3 in your example, where there is a need to add some padding. Let's call it use case 4.

Alan and I have explored an alternative, where dfdl:length is always used for all use cases. The difference for use case 3 & 4 is that the value of element "len" is only set during the processing of "val". Instead of using a flag, with accompanying output length property, to signal case 3 & 4, we use an extra parameter on dfdl:length() that says whether to use padding or not when dfdl:lengthKind="explicit"/"implicit". Note that any escape scheme must and will be taken into account (to answer your question).

For use case 3 when no padding is needed you example simplifies to the following. When "len" is encountered, there is an outputValueCalc that references "val" so the unparser defers the setting of the value of "len". When it gets to "val", it knows it must work out its unpadded length, and set that in "len", before doing any length related processing for "val".

<sequence>
<element name="len" type="int"
dfdl:outputValueCalc=
"{
dfdl:length(../val, false) !-- false => no pad
}" />

... many elements in between ....

<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{ ../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>

For use case 4 when some padding might be needed you example simplifies to the following. When the unparser starts to process "val", it works out the unpadded length, uses it in the expression and generates the value for "len". When it does the length processing for "val" it pads to the value of "len".

<sequence>
<element name="len" type="int"
dfdl:outputValueCalc=
"{
fn:ceiling(dfdl:length(../val, false) div 4) * 4
}" />

... many elements in between ....

<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{ ../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>

A variation on use case 4 is when we need to pad to a minimum length.

<sequence>
<element name="len" type="int"
dfdl:outputValueCalc=
"{
fn:min(dfdl:length(../val, false), 20)
}" />

... many elements in between ....

<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{ ../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>

You might be tempted to ask why the minimum is explicitly added. It's because, as currently spec'd, xs:minLength facet (and dfdl:outputMinLength for non-strings) are not used when dfdl:lengthKind="explicit". We could change this but it does make the padding rules more complicated. We opted for leaving the padding rules simpler.

Yesterday we also dsicussed whether implict/explicit needed to change. With the above scheme we think a change is not necessary.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

"Mike Beckerle" <mbeckerle.dfdl@gmail.com>
Sent by: dfdl-wg-bounces@ogf.org

09/06/2009 04:30

Please respond to
mbeckerle.dfdl@gmail.com

To	<dfdl-wg@ogf.org>
cc
Subject	[DFDL-WG] outputValueCalc and unparse example

I did not get as far as I wanted to on this issue. I would like to discuss this example:

<sequence>
<element name="len" type="int"
dfdl:fillByte="%#r0;"
dfdl:outputValueCalc=
"{
dfdl:representation-output-length(../val)
}" />

... many elements in between ....

<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:useLengthForOutput="false"
dfdl:length="{ ../len }"
dfdl:outputLength="{
fix:ceiling(
dfdl:representation-inherent-length(.) div 4
) * 4
}"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>

You will notice I added a dfdl:outputLength property, and a dfdl:representation-output-length() function and dfdl:representation-inherent-length().

I am accepting candidates for better names for these properties and functions. We need to distinguish these 3 concepts:

1) inherent length – of the infoset item without reference to any facets, and with out respect to escape sequences, padding or truncation.

(TBD: think about escape sequences? Is this right)

2) output target length – the length of the box we’re filling in with the data value representation. The box can be bigger or smaller than the inherent length, which implies use of padding/filling, or truncation.

3) input length – length of the box we’re getting when parsing. The inherent length of the value after parsing can be smaller than the length of the box due to removal of escape characters, and the trimming of padding.

-- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU