Mike,
I believe that this is all part of the subject
of deferred action 242. It sounds like we should undefer the action
as it is impacting the work on the Daffodil serializer.
I have the last email exchanges for action
242, from April 2014. I can re-send them.
Regards
Steve Hanson
IBM
Integration Bus, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
19/05/2016 17:25
Subject:
[DFDL-WG] clarifications
needed?: dfdl:contentLength function and dfdl:valueLength function on empty
and literal nil representations, and escaping
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
The dfdl:contentLength function is defined in terms of
the SimpleContent or ComplexContent regions of the grammar.
Let's just look at Simple types for a sec.
We do not specify what the dfdl:contentLength is for an
element of SimpleType which has the SimpleLiteralNilElementRep or SimpleEmptyElementRep.
I suggest the value should be zero for SimpleEmptyElementRep. When parsing,
an empty element by definition has no content. The fact that a default
value might be inserted because of the empty representation should not
change the fact that there was no content. When unparsing, SimpleEmptyElementRep
can occur if an empty string is the value of a string-valued element, or
an empty byte array is the value of a hexBinary element. The grammar is
just stipulating the different treatment of initiator/terminator for these
special cases of empty things. The content is length zero.
But consider the round-trip scenario. We parse data to
the infoset. During parsing the dfdl:contentLength of an element having
SimpleEmptyElementRep is zero. A default value is inserted. Now we unparse
this same infoset. The default value's representation very well may be
SimpleNormalRep, with non-zero dfdl:contentLength.
I claim this is ok. This is just another case where some
data formats don't round trip unchanged. It does add an implementation
headache, which is if the contentLength is cached on the infoset item,
you need separate cache locations to be used when parsing and when unparsing.
For SimpleLiteralNilElementRep, it should be the length of the NilLiteralCharacters
or NilElementLiteralContent regions. (Note: there's the word "Content"
implying that we think of the nil literal representation as content. )
This applies to both parsing and unparsing.
For elements of complex type, I think for both ComplexLiteralNilElementRep
and ComplexEmptyElementRep, the dfdl:contentLength should be zero when
parsing. When unparsing, again a complex default may be created (because
default values for interior elements of the complex type might be filled
in as part of the augmented infoset.) and the dfdl:contentLength might
not be zero if these default values have non-zero content length. Again
I think this is ok.
For dfdl:contentLength, we should clarify that the length
should also include the contributions of any escape characters, escape-escape
characters, and escapeBlockStart/End characters. (This is implied, because
such characters are in the "value" regions of simple types, and
value regions are always contained in the content region, but I think the
clarification is still helpful.
Similarly we need to clarify what dfdl:valueLength does.
For SimpleEmptyElementRep the dfdl:valueLength should
be zero.
For SimpleLiteralNilElementRep, the dfdl:valueLength should
be zero, because a nilled element has no value.
The corner case of SimpleLiteralNilElementRep for a nillable
simple element of type xs:string - since a literal nil representation and
a string value are ambiguous, should be handled by calling dfdl:contentLength
instead of dfdl:valueLength. So a nillable string element with literal
nil nilValue="nil", should have dfdl:valueLength of zero, but
dfdl:contentLength (in characters) of 3. Same element but not nilled,
containing the string "nil" as its value, would have dfdl:valueLength
of 3 (characters), and dfdl:contentLength of 3 (characters).
For complex type elements, dfdl:valueLength is already
defined to be the same as dfdl:contentLength.
For elements that are not represented (that is, elements
that have the dfdl:inputValueCalc property on them), I believe both dfdl:valueLength
and dfdl:contentLength should cause an SDE, as this has to be an error
on the part of the schema author. (An argument can be made that these should
return zero however. See next paragraph.)
Note however that these functions can be called on elements of complex
type that contain elements that are not represented. Such contained non-represented
elements contribute zero to the content length in all cases. (Consistency
with this is why calling dfdl:valueLength or dfdl:contentLength directly
on a non-represented element might want to return zero, instead of SDE.)
dfdl:valueLength is already specified to exclude the length
of padding characters that are trimmed/added.
I believe we should explicitly state that it *includes* the length of escape,
escape-escape, and escapeBlockStart/End characters.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU