Please bear in mind that if this proposal is accepted and an erratum created, IBM can give no timescale as to when IBM DFDL will be updated to support it. So you are advised to continue specifying lengths of text elements in units of characters, accompanied by a suitable comment, for interoperability.

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>,
Date: 23/07/2014 23:35
Subject: [DFDL-WG] lengthUnits bits not allowed for strings, binary floats, hexBinary
Sent by: dfdl-wg-bounces@ogf.org

DFDL spec currently says this w.r.t. lengthUnits property:

'bits' may only be used for xs:boolean, xs:byte, xs:short, xs:int, xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and xs:unsignedLong simple types with binary representation.

This feels like a hold over from when we only had strings made up of 8-bit byte code-units. Now that we have 7-bit and 6-bit characters, this restriction seems unnecessary, and is in fact awkward because many specs specify the lengths of strings in bits (which are used universally for the length of everything in these formats). This is a real concern. In many cases DFDL Schemas will be generated from other specifications by programs. Having to conditionally convert the length as specified into different units for strings is just one more place to have to test, one more way the DFDL schema doesn't obviously match the specification from which it was derived, etc.

Similarly:

'bytes' must be used for type xs:hexBinary.
'bytes' must be used for types xs:float and xs:double with binary representation.

These are to prevent the user misunderstanding the limitations of these types. I.e., that we dont support hexbinary that is not a multiple of 8-bits in size, and float and double that are not exactly 4 and 8 bytes respectively.

But now this restriction just seems annoying. If my data format specification base has all these values in bits, then it is painful when creating a DFDL schema to have to transform the values for just those element declarations that are of these types.

I'm not suggesting we lift the actual restrictions. I'm good with hexBinary requiring whole bytes, and that float and double are exactly 32-bits and 64-bits respectively. I just think having to use bytes as the length units is just arbitrary. We thought it would be preventing people from making mistakes, but in fact it is likely to have the opposite effect, forcing them to have to interpret the length differently based on a type that might not even be defined in the same file where they see the dfdl:length property. Consider:

<element name="x" type="foo:xType" dfdl:length="448"/>

Is that 448 correct? It depends on the definition of foo:xType. If it's a simple type derived from string, then length units has to be characters or bytes, but in all the formats where I see these 448's. They are measured in bits. This is 56 bytes, holding 64 characters. But when I write out this element I don't have the information right there to know whether to divide by 8 or 7 or not without knowledge of the type.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy
-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU