Mike
For an element with lengthKind 'endOfParent',
the lengthUnits property is not applicable, so there is no concept of lengthUnits
being the same. I think what you are trying to say is that when the 'parent
lengthUnits' is not 'characters' then the endOfParent element must have
an SBCS encoding.
We need to have a meaning for 'parent
lengthUnits' for all the endOfParent scenarios. I believe it is:
- lengthKind 'explicit' - dfdl:lengthUnits
- lengthKind 'prefixed' - dfdl:lengthUnits
- lengthKind 'pattern' - always 'characters'
- choiceLengthKind 'explicit' - always
'bytes'
- lengthKind 'endOfParent' - recursively
apply above
And your suggestion also applies to
other types when representation is 'text', and not just to strings.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
24/04/2014 17:49
Subject:
[DFDL-WG] Missing
restriction in 'endOfParent' length kind ??
Sent by:
dfdl-wg-bounces@ogf.org
I think we're missing a restriction for 'endOfParent'
Restriction: If an element has lengthKind 'endOfParent',
then it's lengthUnits must be the same as the lengthUnits of the parent.
SDE otherwise.
Rationale:
If a string has fixed length, and lengthUnits 'bytes' then currently
we require the textStringPadCharacter to be a single-byte character. This
prevents issues where there are bytes that can't be filled in by a pad
character nor trimmed as a pad character.
But if the string has 'endOfParent' length, the string
could have lengthUnits 'characters', encoding utf-16, and would then typically
have a 2-byte pad character. But the enclosing parent could have lengthUnits
bytes and an odd fixed/specified length. Hence, a byte could be left over
that cannot be filled with a pad character, nor trimmed as a pad character.
It is noisome to have this corner case/excess byte problem
for 'endOfParent' when we have so cleverly dodged it for all the specified-length
kinds with the restriction to single-byte pad characters when lengthUnits
is bytes.
There is language in the spec about dealing with these
trailing RightFill or ElementUnused regions. With the above restriction
I believe RightFill becomes associated only with the hexBinary type being
extended to fill out a specified-length or endOfParent box. The much more
complex issue of RightFill appearing after text characters remains, but
outside of the context where padding/trimming are involved.
It is really problematic language. Frought with possibilities for misinterpretation.
To me it is very preferable to just make 'endOfParent' consistent with
other constraints associated with padding for specified-length elements.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU