Re: [DFDL-WG] lengthUnits bits not allowed for strings, binary floats, hexBinary

25 Jul 2014

      Please bear in mind that if this proposal is accepted and an erratum 
created, IBM can give no timescale as to when IBM DFDL will be updated to 
support it. So you are advised to continue specifying lengths of text 
elements in units of characters, accompanied by a suitable comment, for 
interoperability.

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From:   Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:     "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, 
Date:   23/07/2014 23:35
Subject:        [DFDL-WG] lengthUnits bits not allowed for strings, binary 
floats, hexBinary
Sent by:        dfdl-wg-bounces@ogf.org

DFDL spec currently says this w.r.t. lengthUnits property:

'bits' may only be used for xs:boolean, xs:byte, xs:short, xs:int, 
xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and 
xs:unsignedLong simple types with binary representation.
This feels like a hold over from when we only had strings made up of 8-bit 
byte code-units. Now that we have 7-bit and 6-bit characters, this 
restriction seems unnecessary, and is in fact awkward because many specs 
specify the lengths of strings in bits (which are used universally for the 
length of everything in these formats). This is a real concern. In many 
cases DFDL Schemas will be generated from other specifications by 
programs. Having to conditionally convert the length as specified into 
different units for strings is just one more place to have to test, one 
more way the DFDL schema doesn't obviously match the specification from 
which it was derived, etc. 

Similarly:
'bytes' must be used for type xs:hexBinary.
'bytes' must be used for types xs:float and xs:double with binary 
representation.
These are to prevent the user misunderstanding the limitations of these 
types. I.e., that we dont support hexbinary that is not a multiple of 
8-bits in size, and float and double that are not exactly 4 and 8 bytes 
respectively. 

But now this restriction just seems annoying. If my data format 
specification base has all these values in bits, then it is painful when 
creating a DFDL schema to have to transform the values for just those 
element declarations that are of these types. 

I'm not suggesting we lift the actual restrictions. I'm good with 
hexBinary requiring whole bytes, and that float and double are exactly 
32-bits and 64-bits respectively. I just think having to use bytes as the 
length units is just arbitrary. We thought it would be preventing people 
from making mistakes, but in fact it is likely to have the opposite 
effect, forcing them to have to interpret the length differently based on 
a type that might not even be defined in the same file where they see the 
dfdl:length property. Consider:

<element name="x" type="foo:xType" dfdl:length="448"/>

Is that 448 correct? It depends on the definition of foo:xType. If it's a 
simple type derived from string, then length units has to be characters or 
bytes, but in all the formats where I see these 448's. They are measured 
in bits. This is 56 bytes, holding 64 characters. But when I write out 
this element I don't have the information right there to know whether to 
divide by 8 or 7 or not without knowledge of the type. 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg@ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU