Missing restriction in 'endOfParent' length kind ??

I think we're missing a restriction for 'endOfParent' Restriction: If an element has lengthKind 'endOfParent', then it's lengthUnits must be the same as the lengthUnits of the parent. SDE otherwise. Rationale: If a string has fixed length, and lengthUnits 'bytes' then currently we require the textStringPadCharacter to be a single-byte character. This prevents issues where there are bytes that can't be filled in by a pad character nor trimmed as a pad character. But if the string has 'endOfParent' length, the string could have lengthUnits 'characters', encoding utf-16, and would then typically have a 2-byte pad character. But the enclosing parent could have lengthUnits bytes and an odd fixed/specified length. Hence, a byte could be left over that cannot be filled with a pad character, nor trimmed as a pad character. It is noisome to have this corner case/excess byte problem for 'endOfParent' when we have so cleverly dodged it for all the specified-length kinds with the restriction to single-byte pad characters when lengthUnits is bytes. There is language in the spec about dealing with these trailing RightFill or ElementUnused regions. With the above restriction I believe RightFill becomes associated only with the hexBinary type being extended to fill out a specified-length or endOfParent box. The much more complex issue of RightFill appearing after text characters remains, but outside of the context where padding/trimming are involved. It is really problematic language. Frought with possibilities for misinterpretation. To me it is very preferable to just make 'endOfParent' consistent with other constraints associated with padding for specified-length elements. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy<http://www.ogf.org/About/abt_policies.php>

Mike For an element with lengthKind 'endOfParent', the lengthUnits property is not applicable, so there is no concept of lengthUnits being the same. I think what you are trying to say is that when the 'parent lengthUnits' is not 'characters' then the endOfParent element must have an SBCS encoding. We need to have a meaning for 'parent lengthUnits' for all the endOfParent scenarios. I believe it is: - lengthKind 'explicit' - dfdl:lengthUnits - lengthKind 'prefixed' - dfdl:lengthUnits - lengthKind 'pattern' - always 'characters' - choiceLengthKind 'explicit' - always 'bytes' - lengthKind 'endOfParent' - recursively apply above And your suggestion also applies to other types when representation is 'text', and not just to strings. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 24/04/2014 17:49 Subject: [DFDL-WG] Missing restriction in 'endOfParent' length kind ?? Sent by: dfdl-wg-bounces@ogf.org I think we're missing a restriction for 'endOfParent' Restriction: If an element has lengthKind 'endOfParent', then it's lengthUnits must be the same as the lengthUnits of the parent. SDE otherwise. Rationale: If a string has fixed length, and lengthUnits 'bytes' then currently we require the textStringPadCharacter to be a single-byte character. This prevents issues where there are bytes that can't be filled in by a pad character nor trimmed as a pad character. But if the string has 'endOfParent' length, the string could have lengthUnits 'characters', encoding utf-16, and would then typically have a 2-byte pad character. But the enclosing parent could have lengthUnits bytes and an odd fixed/specified length. Hence, a byte could be left over that cannot be filled with a pad character, nor trimmed as a pad character. It is noisome to have this corner case/excess byte problem for 'endOfParent' when we have so cleverly dodged it for all the specified-length kinds with the restriction to single-byte pad characters when lengthUnits is bytes. There is language in the spec about dealing with these trailing RightFill or ElementUnused regions. With the above restriction I believe RightFill becomes associated only with the hexBinary type being extended to fill out a specified-length or endOfParent box. The much more complex issue of RightFill appearing after text characters remains, but outside of the context where padding/trimming are involved. It is really problematic language. Frought with possibilities for misinterpretation. To me it is very preferable to just make 'endOfParent' consistent with other constraints associated with padding for specified-length elements. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Mike Beckerle
-
Steve Hanson