I did not get as far as I wanted to on this issue. I would like to discuss this example:

 

<sequence>

  <element name="len" type="int"

 dfdl:fillByte="%#r0;"

    dfdl:outputValueCalc=

      "{

           dfdl:representation-output-length(../val) 

       }" />

 

  ... many elements in between ....

 

  <element name="val" type="string"

     dfdl:encoding="utf-8"

     dfdl:lengthKind="explicit"

     dfdl:lengthUnits="bytes"

    dfdl:useLengthForOutput="false"

     dfdl:length="{ ../len }"

    dfdl:outputLength="{            

         fix:ceiling(

           dfdl:representation-inherent-length(.) div 4

           ) * 4

    }"

 dfdl:textTrimKind="padChar"

 dfdl:textStringJustification="left"

    dfdl:textPadCharacter="%#r0;"                 

    />

</sequence>

 

 

You will notice I added a dfdl:outputLength property, and a dfdl:representation-output-length() function and dfdl:representation-inherent-length().

 

I am accepting candidates for better names for these properties and functions. We need to distinguish these 3 concepts:

 

1) inherent length – of the infoset item without reference to any facets, and with out respect to escape sequences, padding or truncation.

 

(TBD: think about escape sequences? Is this right)

 

2) output target length – the length of the box we’re filling in with the data value representation. The box can be bigger or smaller than the inherent length, which implies use of padding/filling, or truncation.

 

3) input length – length of the box we’re getting when parsing. The inherent length of the value after parsing can be smaller than the length of the box due to removal of escape characters, and the trimming of padding.