Mike
I'd like to state what we said were
the use cases yesterday. There were three. .
Use case 1
Element "val" is fixed length,
length known at design time and provided by dfdl:length="x" on
input and output.
On output, the infoset data for "val"
is padded to the length.
Use case 2
Element "val" is fixed length,
length known at runtime and provided by dfdl:length="{..\len}"
on input and output.
On output the infoset provides the value
for element "len".
On output, the infoset data for "val"
is padded to the length according to the rules for dfdl:lengthKind='explicit'/'implicit'.
Use case 3
Element "val" is really variable
length, length only known once the data is serialised, and provided by
dfdl:length="..\len" on input.
On output the value of element "len"
is set only once the length of "val" is known.
On output, the infoset data for "val"
is not padded to the length.
You've added a variation to use case
3 in your example, where there is a need to add some padding. Let's call
it use case 4.
Alan and I have explored an alternative,
where dfdl:length is always used for all use cases. The difference
for use case 3 & 4 is that the value of element "len" is
only set during the processing of "val". Instead of using a flag,
with accompanying output length property, to signal case 3 & 4, we
use an extra parameter on dfdl:length() that says whether to use padding
or not when dfdl:lengthKind="explicit"/"implicit".
Note that any escape scheme must and will be taken into account (to answer
your question).
For use case 3 when no padding is needed
you example simplifies to the following. When "len" is encountered,
there is an outputValueCalc that references "val" so the unparser
defers the setting of the value of "len". When it gets to "val",
it knows it must work out its unpadded length, and set that in "len",
before doing any length related processing for "val".
<sequence>
<element name="len"
type="int"
dfdl:outputValueCalc=
"{
dfdl:length(../val,
false) !-- false => no pad
}" />
... many elements in between
....
<element name="val"
type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{
../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
For use case 4 when some padding might
be needed you example simplifies to the following. When the unparser
starts to process "val", it works out the unpadded length, uses
it in the expression and generates the value for "len". When
it does the length processing for "val" it pads to the value
of "len".
<sequence>
<element name="len"
type="int"
dfdl:outputValueCalc=
"{
fn:ceiling(dfdl:length(../val,
false) div 4) * 4
}" />
... many elements in between
....
<element name="val"
type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{
../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
A variation on use case 4 is when we
need to pad to a minimum length.
<sequence>
<element name="len"
type="int"
dfdl:outputValueCalc=
"{
fn:min(dfdl:length(../val,
false), 20)
}" />
... many elements in between
....
<element name="val"
type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{
../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
You might be tempted to ask why the
minimum is explicitly added. It's because, as currently spec'd, xs:minLength
facet (and dfdl:outputMinLength for non-strings) are not used when dfdl:lengthKind="explicit".
We could change this but it does make the padding rules more complicated.
We opted for leaving the padding rules simpler.
Yesterday we also dsicussed whether
implict/explicit needed to change. With the above scheme we think a change
is not necessary.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
Sent by: dfdl-wg-bounces@ogf.org
09/06/2009 04:30
Please respond to
mbeckerle.dfdl@gmail.com |
|
To
| <dfdl-wg@ogf.org>
|
cc
|
|
Subject
| [DFDL-WG] outputValueCalc and unparse
example |
|
I did not get as far as I wanted to on this
issue. I would like to discuss this example:
<sequence>
<element name="len"
type="int"
dfdl:fillByte="%#r0;"
dfdl:outputValueCalc=
"{
dfdl:representation-output-length(../val)
}" />
... many elements in between
....
<element name="val"
type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:useLengthForOutput="false"
dfdl:length="{
../len }"
dfdl:outputLength="{
fix:ceiling(
dfdl:representation-inherent-length(.)
div 4
)
* 4
}"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
You will notice I added a dfdl:outputLength
property, and a dfdl:representation-output-length() function and dfdl:representation-inherent-length().
I am accepting candidates for better names
for these properties and functions. We need to distinguish these 3 concepts:
1) inherent length – of the infoset item
without reference to any facets, and with out respect to escape sequences,
padding or truncation.
(TBD: think about escape sequences? Is this
right)
2) output target length – the length of
the box we’re filling in with the data value representation. The box can
be bigger or smaller than the inherent length, which implies use of padding/filling,
or truncation.
3) input length – length of the box we’re
getting when parsing. The inherent length of the value after parsing can
be smaller than the length of the box due to removal of escape characters,
and the trimming of padding.
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU