Please bear in mind that if this proposal
is accepted and an erratum created, IBM can give no timescale as to when
IBM DFDL will be updated to support it. So you are advised to continue
specifying lengths of text elements in units of characters, accompanied
by a suitable comment, for interoperability.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
23/07/2014 23:35
Subject:
[DFDL-WG] lengthUnits
bits not allowed for strings, binary floats, hexBinary
Sent by:
dfdl-wg-bounces@ogf.org
DFDL spec currently says this w.r.t. lengthUnits property:
- 'bits' may only be used for xs:boolean, xs:byte, xs:short,
xs:int, xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and
xs:unsignedLong simple types with binary representation.
This
feels like a hold over from when we only had strings made up of 8-bit byte
code-units. Now that we have 7-bit and 6-bit characters, this restriction
seems unnecessary, and is in fact awkward because many specs specify the
lengths of strings in bits (which are used universally for the length of
everything in these formats). This is a real concern. In many cases DFDL
Schemas will be generated from other specifications by programs. Having
to conditionally convert the length as specified into different units for
strings is just one more place to have to test, one more way the DFDL schema
doesn't obviously match the specification from which it was derived, etc.
Similarly:
- 'bytes' must be used for type xs:hexBinary.
- 'bytes' must be used for types xs:float and xs:double
with binary representation.
These are to prevent
the user misunderstanding the limitations of these types. I.e., that we
dont support hexbinary that is not a multiple of 8-bits in size, and float
and double that are not exactly 4 and 8 bytes respectively.
But now this restriction just seems annoying. If my data format specification
base has all these values in bits, then it is painful when creating a DFDL
schema to have to transform the values for just those element declarations
that are of these types.
I'm not suggesting we lift the actual restrictions. I'm
good with hexBinary requiring whole bytes, and that float and double are
exactly 32-bits and 64-bits respectively. I just think having to use bytes
as the length units is just arbitrary. We thought it would be preventing
people from making mistakes, but in fact it is likely to have the opposite
effect, forcing them to have to interpret the length differently based
on a type that might not even be defined in the same file where they see
the dfdl:length property. Consider:
<element name="x" type="foo:xType"
dfdl:length="448"/>
Is that 448 correct? It depends on the definition of foo:xType.
If it's a simple type derived from string, then length units has to be
characters or bytes, but in all the formats where I see these 448's. They
are measured in bits. This is 56 bytes, holding 64 characters. But when
I write out this element I don't have the information right there to know
whether to divide by 8 or 7 or not without knowledge of the type.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU