Note errata 3.9, my bolding:
"3.9. Section
12.3.5, 7.3.1, 7.3.2. The spec originally allows lengthKind ‘pattern’
to be used when the representation of the current element, or of a child
element, is binary, but imposes restrictions on the encoding that can be
in force.
Clarify that the encoding property
must be defined for the element (else schema definition error), and that
a decoding processing error is possible if the match of the regex encounters
data that does not decode in that encoding, dependent on the setting of
encodingErrorPolicy. Remove section 12.3.5.1.
Same clarifications needed for testKind
”pattern” property for asserts and discriminators.
For consistency, the restriction that
a complex element of specified length and lengthUnits ‘characters’ must
have children that are all text and that have the same encoding as the
complex element, is dropped."
That's the restriction that I was referring to in my comment below. I
can see why it was dropped - basically the parser now just tries to decode
n characters using the complex element's encoding (and encodingErrorPolicy).
We could apply the same principle for dfdl:valueLength & dfdl:contentLength
- you build the stream from the bottom up, and then decode it using the
complex element's encoding (and encodingErrorPolicy ?) to get the length
in characters.
Note that's how unparsing for lengthKind
'prefixed' with lengthUnits 'characters' would work as well - the
spec just says "For a complex
element, the length is that of the ComplexContent region"
which is not sufficient (12.3.4). Similar deal for lengthKind 'explicit'
- in order to know the size in chars of ElementUnused the unparser
needs to know the size in chars of the data first (12.3.7.3).
(Of course, for a fixed width encoding,
you don't need to decode, you can just do the maths, but for the general
case you need to decode. Also just doing the maths does not take encodingErrorPolicy
into account).
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>,
Cc:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>, dfdl-wg-bounces@ogf.org
Date:
24/03/2014 12:55
Subject:
Re: [DFDL-WG]
Action 242 - valueLength and contentLength function
wording
Mike
23.5.3.1. Value length is only a function
of the dfdl:encoding property if the element has a text representation.
Not sure this needs to be (re)stated here.
23.5.3.1. "The
value length is computed from the DFDL infoset value, ignoring the dfdl:length
or dfdl:textOutputMinLength property. Other DFDL properties which affect
the length of a text or binary representation are respected, it is only
an explicit length which is ignored." Last sentence is too imprecise
- should be phrased in terms of the grammar.
23.5.3.1. "If
the second argument is 'characters' then the element must have text representation
and it is a schema definition error otherwise".
Yes but only for a simple type, so should be qualified.
23.5.3.1. "If
the second argument, giving the length units, is 'characters', then recursively,
this complex type element must have text representation throughout all
its contained elements and framing, all of which must also use a uniform
character set encoding."
I can't see that restriction elsewhere in the spec when it talks
about length of ComplexContent and lengthUnits 'characters' - I was expecting
it to be in section 12.3.4 or 12.3.7.3 which face the same issue - but
it isn't. Did we decide not to have this restriction? Without such a restriction,
how does the unparser come up with a meaningful length (unless it re-parses)?
(Tim - what does IBM DFDL do here?) What about delimiters
and padding of children that use %#r entities?
23.5.3.2. The points in 23.5.3.1 about
escape characters, length as a function of encoding, and bottom up for
complex elements, apply equally to 23.5.3.2. It might be easier just
to say in 23.5.3.2 that dfdl:contentLength for complex elements is same
as dfdl:valueLength, and for simple elements differs only by the additional
inclusion of LeftPadding and RightPadOrFill regions.
Also noted in passing:
Specified length - An item
has specified length when dfdl:lengthKind is "implicit", "explicit",
or "prefixed".
should be
Specified length - An element
has specified length when dfdl:lengthKind is "implicit" (simple
type only), "explicit", or "prefixed".
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
20/03/2014 17:21
Subject:
[DFDL-WG] Action
242 - valueLength and contentLength function wording
Sent by:
dfdl-wg-bounces@ogf.org
See attached doc which is proposed revisions to section
23.5.3
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU